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ABSTRACT OF THE DISCLOSURE 

The protease necessary for polyprotein processing in Hepatitis C 
virus is identified, cloned, and expressed. Proteases, truncated protease, and 
altered proteases are disclosed which are useful for cleavage of specific 
polypeptides, and for assay and design of antiviral agents specific for HCV. 
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CLAIMS : 

1 . A purified protease derived from the NS3 region of hepatitis C 
virus as shown in Figure 1 or truncations thereof having protease activity. 

2. A purified protease according to claim 1 wherein said 
5 protease comprises a partial internal amino acid sequence substantially as shown 

in amino acids 135-145 of Figure 1. 

3. A purified protease according to claim 1 or 2 wherein said 
protease comprises a partial internal amino acid sequence substantially as shown 

in amino acids 21 7-225 of Figure 1 . 
10 4. A purified protease derived from the NS3 region of hepatitis C 

virus wherein said protease comprises a partial internal amino acid sequence 
substantially as shown in amino acids 1-199 of Figure 1 

5. A purified protease according to claim 4 wherein said 
protease comprises a partial amino acid sequence substantially as shown in 

1 5 amino acids 1 -299 of Figure 1 . 

6. A purified protease according to any one of the preceding 
claims wherein said protease comprises the amino acid sequence substantially as 
shown in amino acids 1-686 of Figure 1. 

7. A purified protease according to any one of claims 1 to 3 
20 wherein said protease comprises a partial internal amino acid sequence 

substantially as shown in amino acids 60-262 of Figure 1 . 

8. A purified protease according to any one of the preceding 
claims wherein said protease comprises at least one of a histidine, aspartate and 
serine residue at positions corresponding to amino acids 1084, 1108 and 1166, 

25 respectively, of the hepatitis C virus polyprotein. 

9. A fusion protein comprising a suitable fusion partner fused to 
a protease or polypeptide as defined in any one of the preceding claims. 
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10. A fusion protein according to claim 9 wherein said fusion 
partner is selected from the group consisting of human superoxide dismutase, 
ubiquitin, yeast a-factor, IL-2S, p-galactosidase, p-lactamase, horseradish 
peroxidase, glucose oxidase and urease. 
5 11. A polynucleotide encoding a fusion protein comprising a 

protease or polypeptide as defined in any one of claims 1 to 8; and a fusion 
partner. 

12. A polynucleotide according to claim 11 wherein said fusion 
partner is selected from the group consisting of human superoxide dismutase, 

10 ubiquitin, yeast a-factor, IL-2S, p-galactosidase, p-lactamase, horseradish 
peroxidase, glucose oxidase and urease. 

13. An expression vector for producing an hepatits C virus 
protease in a host cell, which vector comprises: 

(a) a polynucleotide encoding a protease or polypeptide as defined 
15 in any one of claims 1 to 8; 

(b) transcriptional and translational regulatory sequences functional 
in said host cell, operably linked to said polynucleotide; and 

(c) a selectable marker. 

14. A vector according to claim 13 which further comprises a 
20 sequence encoding a fusion partner, linked to said polynucleotide to form a fusion 

protein upon expression. 

15. A vector according to claim 14 wherein said fusion partner is 
selected from the group consisting of human superoxide dismutase, ubiquitin, 
yeast a-factor, IL-2S, p-galactosidase, p-lactamase, horseradish peroxidase, 

25 glucose oxidase and urease. 

16. A method for assaying compounds for activity against 
hepatitis C virus, which method comprises: 
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(a) providing a protease derived from the NS3 region of hepatitis C 

virus; 

(b) contacting said protease with a compound capable of inhibiting 
serine protease activity; and 

(c) measuring inhibition of the proteolytic activity of said protease. 
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Technical Field 

This invention relates to the molecular biology and virology of the 
hepatitis C virus (HCV), More specifically, this invention relates to a novel 
15 protease produced by HCV, methods of expression, recombinant protease, prote- 
ase mutants, and inhibitors of HCV protease. 

Rartr nnnd ftf the Tnvention 

Non-A, Non-B hepatitis (NANBH) is a transmissible disease (or 
20 family of diseases) that is believed to be virally induced, and is distinguishable 
from other forms of virus-associated liver disease, such as those caused by hep- 
atitis A vims (HAV), hepatitis B virus (HBV), delta hepatitis vims (HDV), cyto- 
megalovirus (CMV) or Epstein-Barr vims (EBV). Epidemiologic evidence sug- 
gests that there may be three types of NANBH: the water-borne epidemic type; 
25 the blood or needle associated type; and the sporadically occurring (community 
acquired) type. However, the number of causative agents is unknown. Recently, 
however, a new viral species, hepatitis C virus (HCV) has been identified as the 
primary (if not only) cause of blood-associated NANBH (BB-NANBH). See for 
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example, PCT WO89/046699, 

* 

Hepatitis C appears to be the 
major form of transfusion-associated hepatitis in a number of countries, including 

5 the United States and Japan. There is also evidence implicating HCV in induc- 
tion of hepatocellular carcinoma. Thus, a need exists for an effective method for 
treating HCV infection: currently, there is none. 

Many viruses, including adenoviruses, baculoviruses, comovimses, 
picornaviruses, retroviruses, and togaviruses, rely on specific virally-encoded pro- 

10 teases for processing polypeptides from their initial translated form into mature, 
active proteins. In the case of picornaviruses, all of the viral proteins are believed 
to arise from cleavage of a single polyprotein (B.D. Korant, CFC Cnt RgV 
Bitted! (1988)1:149-57). 

S. Pichuantes et al, in "Viral Proteinases As Targets For Chemother- 

15 apy" (Cold Spring Harbor Laboratory Press, 1989) pp. 215-22, disclosed expression 
of a viral protease found in HTV-1. The HIV protease was obtained in the form 
of a fusion protein, by fusing DNA encoding an HIV protease precursor to DNA 
encoding human superoxide dismutase (hSOD), and expressing the product in EL 
coli. Transformed cells expressed products of 36 and 10 kDa (corresponding to 

20 the hSOD-protease fusion protein and the protease alone), suggesting that the 
protease was expressed in a form capable of autocatalytic proteolysis. 

TJ. McQuade et al, Science (1990)242:454-56 disclosed preparation 
of a peptide mimic capable of specifically inhibiting the HTV-1 protease. In HIV, 
the protease is believed responsible for cleavage of the initial p55 gag precursor 

25 transcript into the core structural proteins (pl7, p24, p8, and p7). Adding 1 /M 
inhibitor to HIV-infected peripheral blood lymphocytes in culture reduced the 
concentration of processed HIV p24 by about 70%. Viral maturation and levels 
of infectious virus were reduced by the protease inhibitor. 
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Disclosure off the Invention 

We have now invented recombinant HCV protease, HCV protease 
fusion proteins, truncated and altered HCV proteases, cloning and expression 
vectors therefore, and methods for identifying antiviral agents effective for treating 
5 HCV. 

According to a first aspect of the invention, there is provided a 
purified protease derived from the NS3 region of hepatitis C virus as shown in 
Figure 1 or truncations thereof having protease activity. 

The protease may comprise a partial internal amino acid sequence 
10 substantially as shown in amino acids 135-145 of Figure 1, amino acids 217-225 
of Figure 1 1 amino adds 60-262 of Figure 1 or amino acids 1-686 of Figure 1 . 

According to a second aspect of the invention, there is provided a 
purified protease derived from the NS3 region of hepatitis C virus wherein said 
protease comprises a partial internal amino acid sequence substantially as shown 
1 S in amino acids 1 -1 99 of Figure 1 . 

The protease may comprise a partial amino acid sequence 
substantially as shown in amino acids 1-299 of Figure 1 or amino acids 1-686 of 
Figure 1 . 

Preferably, the above-described proteases comprise at least one of 
20 a histidine, aspartate and serine residue at positions corresponding to amino 
acids 1084, 1 108 and 1166, respectively, of the hepatitis C virus polyprotein. 

According to a third aspect of the invention, there is provided a 
fusion protein comprising a suitable fusion partner fused to any one of the above- 
described proteases or polypeptides. 



•3a- 

2079105 

Preferably, the fusion partner is selected from the group consisting 
of human superoxide dismutase, ubiquitin, yeast a-factor, IL-2S, p-galactosidase, 
p-lactamase, horseradish peroxidase, glucose oxidase and urease. 

According to a fourth aspect of the invention, there is provided a 
5 polynucleotide encoding a fusion protein comprising any one of the above- 
described proteases or polypeptides and a fusion partner. 

Preferably, the fusion partner is selected from the group consisting 
of human superoxide dismutase, ubiquitin, yeast a-factor, IL-2S, p-galactosidase, 
p-lactamase, horseradish peroxidase, glucose oxidase and urease. 
10 According to a fifth aspect of the invention, there is provided an 

expression vector for producing an hepatitis C virus protease in a host cell, which 

vector comprises: 

(a) a polynucleotide encoding any one of the above-described 

proteases or polypeptides; 
15 (b) transcriptional and translational regulatory sequences functional 

in said host cell, operably linked to said polynucleotide; and 

(c) a selectable marker. 

Preferably, the vector further comprises a sequence encoding a 
fusion partner, linked to said polynucleotide to form a fusion protein upon 
20 expression. The fusion partner may be selected from the group consisting of 
human superoxide dismutase, ubiquitin, yeast a-factor, IL-2S, p-galactosidase, p- 
lactamase, horseradish peroxidase, glucose oxidase and urease. 

According to a sixth aspect of the invention, there is provided a 
method for assaying compounds for activity against hepatitis C virus, which 
25 method comprises: 

(a) providing a protease derived from the NS3 region of hepatitis C 

virus; 
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(b) contacting said protease with a compound capable of inhibiting 

serine protease activity; and 

(c) measuring inhibition of the proteolytic activity of said protease. 



5 Brief Description of the Drawings 

Figure 1 shows the sequence of HCV protease. 
Figure 2 shows the polynucleotide sequence and deduced amino 
acid sequence of the clone C20c. 

Figure 3 shows the polynucleotide sequence and deduced amino 
10 acid sequence of the clone C26d. 

Figure 4 shows the polynucleotide sequence and deduced amino 
acid sequence of the clone C8h. 

Figure 5 shows the polynucleotide sequence and deduced amino 
acid sequence of the clone C7f. 
IS Figure 6 shows the polynucleotide sequence and deduced amino 

acid sequence of the clone C31. 

Figure 7 shows the polynucleotide sequence and deduced amino 
acid sequence of the clone C35. 

Figure 8 shows the polynucleotide sequence and deduced amino 
20 acid sequence of the clone C33c 

Figure 9 schematically illustrates assembly of the vector 
C7fC20cC300C200. 

Figure 10 shows the sequence of vector cf1 SODp600. 
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Modes of Carrying Out The Invention 
A. Definitions 

The terms 'Hepatitis C Virus" and "HCV refer to the viral species 
that is the major etiological agent of BB-NANBH, the prototype isolate of which 
is identified in PCT WO89/046699; EPO publication 318,216-. 

"HC\T as used herein includes the pathogenic strains 
capable of causing hepatitis C, and attenuated strains or defective interfering par- 
ticles derived therefrom. The HCV genome is comprised of RNA. It is known, 
that RNA-containing viruses have relatively high rates of spontaneous mutation, 
reportedly on the order of 10 to 10 per incorporated nucleotide (Fields & 
Knipe, "Fundamental Virology" (1986, Raven Press, N.Y.)). As heterogeneity and 
fluidity of genotype are inherent characteristics of RNA viruses, there will be mul- 
tiple strains/isolates, which may be virulent or avirulent, within the HCV species. 

Information on several different strains/isolates of HCV is disclosed 
herein, particularly strain or isolate CDC/HCVI (also called HCV1). Information 
from one strain or isolate, such as a partial genomic sequence, is sufficient to 
allow those skilled in the art using standard techniques to isolate new strains/ 
isolates and to identify whether such new strains/isolates are HCV. For example, 
several different strains/isolates are described below. These strains, which were 
obtained from a number of human sera (and from different geographical areas), 
were isolated utilizing the information from the genomic sequence of HCV1. 

The information provided herein suggests that HCV may be dis- 
tantly related to the flaviviridae. The Flavivirus family contains a large number of 
viruses which are small, enveloped pathogens of man. The morphology and com- 
position of Flavivirus particles are known, and are discussed in M.A. Brinton, in 
The Viruses: The Togaviridae And Flaviviridae'* (Series eds. Fraenkel-Conrat and 
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Wagner, voL eds. Schlesinger and Schlesinger, Plenum Press, 1986), pp. 327-374. 
Generally, with respect to morphology, Flaviviruses contain a central nucleocapsid 
surrounded by a lipid bilayer. Virions are spherical and have a diameter of about 
40-50 nm. Their cores are about 25-30 nm in diameter. Along the outer surface 
5 of the virion envelope are projections measuring about 5-10 nm in length with ter- 
minal knobs about 2 nm in diameter. Typical examples of the family include 
Yellow Fever virus, West Nile virus, and Dengue Fever virus. They possess posi- 
tive-stranded RNA genomes (about 11,000 nucleotides) that are slightly larger 
than that of HCV and encode a polyprotein precursor of about 3500 amino acids. 

10 Individual viral proteins are cleaved from this precursor polypeptide. 

The genome of HCV appears to be single-stranded RNA containing 
about 10,000 nucleotides. The genome is positive-stranded, and possesses a con- 
tinuous translational open reading frame (ORF) that encodes a polyprotein of 
about 3,000 amino acids. In the ORF, the structural proteins appear to be 

15 encoded in approximately the first quarter of the N-terminal region, with the 

majority of the polyprotein attributed to non-structural proteins. When compared 
with all known viral sequences, small but significant co-linear homologies are 
observed with the non-structural proteins of the Flavivirus family, and with the 
pestiviruses (which are now also considered to be part of the Flavivirus family). 

20 

The Yellow Fever Virus poly- 
protein contains, from the amino terminus to the carboxy terminus, the nucleocap- 
25 sid protein (C), the matrix protein (M), the envelope protein (E), and the non- 
structural proteins 1, 2 (a+b), 3, 4 (a+b), and 5 (NS1, NS2, NS3, NS4, and NS5). 
Based upon the putative amino acids encoded in the nucleotide sequence of 
HCV1, a small domain at the extreme N-terminus of the HCV polyprotein 
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appears similar both in size and high content of basic residues to the nucleocapsid 
protein (Q found at the N-terminus of flaviviral polyproteins. The non-structural 
proteins 2,3,4, and 5 (NS2-5) of HCV and of yellow fever virus (YFV) appear to 
have counterparts of similar size and bydropathicity, although the amino acid 
5 sequences diverge. However, the region of HCV which would correspond to the 
regions of YFV polyprotein which contains the M, E, and NS1 protein not only 
differs in sequence, but also appears to be quite different in size and hydropathic- 
ity. Thus, while certain domains of the HCV genome may be referred to herein 
as, for example, NS1, or NS2, it should be understood that these designations are 

10 for convenience of reference only; there may be considerable differences between 
the HCV family and flaviviruses that have yet to be appreciated. 

Due to the evolutionary relationship of the strains or isolates of 
HCV, putative HCV strains and isolates are identifiable by their homology at the 
polypeptide level. With respect to the isolates disclosed herein, new HCV strains 

IS or isolates are expected to be at least about 40% homologous, some more than 
about 70% homologous, and some even more than about 80% homologous: some 
may be more than about 90% homologous at the polypeptide level. The tech- 
niques for determining amino acid sequence homology are known in the art. For 
example, the amino acid sequence may be determined directly and compared to 

20 the sequences provided herein. Alternatively the nucleotide sequence of the gen- 
omic material of the putative HCV may be determined (usually via a cDNA inter- 
mediate), the amino acid sequence encoded therein can be determined, and the 
corresponding regions compared. 

The term "HCV protease" refers to an enzyme derived from HCV 

25 which exhibits proteolytic activity, specifically the polypeptide encoded in the NS3 
domain of the HCV genome. At least one strain of HCV contains a protease 
believed to be substantially encoded by or within the following sequence: 
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Arg Arg Giy Arg Glu lie Leu Leu Gly Pro 10 

Ala Asp Gly Met Val Ser Lys Gly Tip Arg 20 

Leu Leu Ala Pro He Thr Ala Tyr Ala Gin 30 

Gin Thr Arg Gly Leu Leu Gly Cys He lie 40 

5 Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin 50 

Val Glu Gly Glu Val Gin He Val Ser Thr 60 

Ala Ala Gin Thr Phe Leu Ala Thr Cys He 70 

AsnGly ValCysTrpThr ValTVrliisGly 80 

Ala Gly Thr Arg Thr He Ala Ser Pro Lys 90 

10 Gly Pro Val He Gin Met Tyr Thr Asn Val 100 

Asp Gin ASH L*u Val Gly Tip Pro Ala Ser 110 

Gin Gly Thr Arg Ser Leu Thr Pro Cys Thr 120 

Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr 130 

Arg His Ala Asp Val He Pro Val Arg Arg 140 

IS Arg Gly Asp Ser Arg Gly Ser Leu Leu Ser ISO 

Pro Arg Pro He Ser Tyr Leu Lys Gly Ser 160 

Ser Gly Gly Pro Leu Leu Cys Pro Ala Gly 170 

His Ala Val Gly He Phe Arg Ala Ala Val 180 

Cys Thr Arg Gly Val Ala Lys Ala Val Asp 190 

20 Phe He Pro Val Glu Asn Leu Glu Thr Thr 200 

Met Arg ••• 202 

The above N and C termini are putative, the actual termini being 

defined by expression and processing in an appropriate host of a DNA construct 

encoding the entire NS3 domain. It is understood that this sequence may vary 

25 from strain to strain, as RNA viruses like HCV are known to exhibit a great deal 
of variation. Further, the actual N and C termini may vary, as the protease is 
cleaved from a precursor polyprotein: variations in the protease amino acid 
sequence can result in cleavage from the polyprotein at different points. Thus, 
the amino- and carboxy-termini may differ from strain to strain of HCV. The first 

30 amino add shown above corresponds to residue 60 in Figure 1. However, the 
minimum sequence necessary for activity can be determined by routine methods. 
The sequence may be truncated at either end by treating an appropriate expres- 
sion vector with an exonuclease (after cleavage at the S' or 3' end of the coding 
sequence) to remove any desired number of base pairs. The resulting coding 

35 polynucleotide is then expressed and the sequence determined. In this manner 
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the activity of the resulting product may be correlated with the amino acid 
sequence: a limited series of such experiments (removing progressively greater 
numbers of base pairs) determines the minimum internal sequence necessary for 
protease activity. We have found that the sequence may be substantially trun- 
5 cated, particularly at the carboxy terminus, apparently with full retention of pro- 
tease activity. It is presently believed that a portion of the protein at the carboxy 
terminus may exhibit helicase activity. However, helicase activity is not required 
of the HCV proteases of the invention. The amino terminus may also be trun- 
cated to a degree without loss of protease activity. 

10 The amino acids underlined above are believed to be the residues 

necessary for catalytic activity, based on sequence homology to putative flavivirus 
serine proteases. Table 1 shows the alignment of the three serine protease cat- 
alytic residues for HCV protease and the protease obtained from Yellow Fever 
Virus, West Nile Fever vims, Murray Valley Fever virus, and Kunjin virus. 

IS Although the other four flavivirus protease sequences exhibit higher homology 

with each other than with HCV, a degree of homology is still observed with HCV. 
This homology, however, was not sufficient for indication by currently available 
alignment software. The indicated amino acids are numbered His^ Asp 103 , and 
Ser 161 in the sequence listed above (His 139 , Asp 16) . and Ser^, in Figure 1). 



20 



25 



TABLE 1: Alignment of Active Residues by Sequence 



30 



Protease His Asp 




HCV CWTVYBQAO DQQLOWPAP IdCGSJgGGPL 

Yellow Fever FHTMWHVTR KEQLVAYGG PSGT£GSPI 

West Nile Fever FHTLWHTTK KEfiRLCYGG PTGTfiGSPI 

Murray Valley FHTLWflTTR KEfiRVTYGG PIGTgGSPI 

Kunjin virus fhtlwhttk kedrlcygg ptgtsgspi 
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Alternatively, one can make catalytic residue 
assignments based on structural homology. Table 2 shows 
alignment of HCV with against the catalytic sites of several 
well-characterized serine proteases based on structural con- 
siderations: protease A from Streptomyces grlseus, cr-lytic 
protease, bovine trypsin, chymotrypsin, and elastase (M. 
James et al, can J Biochem (1978) ££:396). Again, a degree 
of homology is observed. The HCV residues identified are 
numbered His 70 , Aspw and Ser m in the sequence listed 
above. 
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TABLE 2: Alignment of Active Residues by Structure 



Protease 


His 


Asp 


Ser 


S. grlseus A 


TAGHC 


NNQYGII 


GDfiGGSL 


a-Lytic protease 


TAGHC 


GNfiRAWV 


GDfiGGSW 


Bovine Trypsin 


SAAHC 


NNQIMLI 


GDfiGGPV 


Chymotrypsin 


TAAflC 


NHQITLL 


GDSGGPL 


Elastase 


TAAHC 


GYDIALL 


GDfiGGPL 


HCV 


TVXflQ 


SBfiLYLV 


OSgOGPL 
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The most direct manner to verify the residues essential to the active 
site is to replace each residue individually with a residue of equivalent stearic size. 
This is easily accomplished by site-specific mutagenesis and similar methods 
known in the art If replacement of a particular residue with a residue of equiva- 
lent size results in loss of activity, the essential nature of the replaced residue is 
confirmed. 

"HCV protease analogs" refer to polypeptides which vary from the 
full length protease sequence by deletion, alteration and/or addition to the amino 
acid sequence of the native protease. HCV protease analogs include the trun- 
cated proteases described above, as well as HCV protease muteins and fusion 
proteins comprising HCV protease, truncated protease, or protease muteins. 
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Alterations to form HCV protease rauteins are preferably conservative amino acid 
substitutions, in which an amino acid is replaced with another naturally-occurring 
amino acid of similar character. For example, the following substitutions are con- 
sidered "conservative": 

Gly - Ala; Lys - Arg; 

Val ~ He ~ Leu; Asn - Gin; and 

Asp - Glu; Phe - Trp ~ Tyr. 
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Nonconservative changes are generally substitutions of one of the above amino 
acids with an amino acid from a different group (e.g., substituting Asn for Glu), or 
substituting Cys, Met, His, or Pro for any of the above amino acids. Substitutions 
involving common amino adds are conveniently performed by site specific muta- 
5 genesis of an expression vector encoding the desired protein, and subsequent 
expression of the altered form* One may also alter amino adds by synthetic or 
semi-synthetic methods* For example, one may convert cysteine or serine residues 
to selenocysteine by appropriate chemical treatment of the isolated protein. 
Alternatively, one may incorporate uncommon amino adds in standard in vitro 
10 protein synthetic methods. Typically, the total number of residues changed, 
deleted or added to the native sequence in the muteins will be no more than 
about 20, preferably no more than about 10, and most preferably no more than 
about 5. 

The term fusion protein generally refers to a polypeptide comprising 
15 an amino add sequence drawn from two or more individual proteins. In the 

present invention, "fusion protein** is used to denote a polypeptide comprising the 
HCV protease, truncate, mutein or a functional portion thereof, fused to a non- 
HCV protein or polypeptide ("fusion partner"). Fusion proteins are most conven- 
iently produced by expression of a fused gene, which encodes a portion of one 
20 polypeptide at the 5' end and a portion of a different polypeptide at the 3' end, 
where the different portions are joined in one reading frame which may be 
expressed in a suitable host It is presently preferred (although not required) to 
position the HCV protease or analog at the carboxy terminus of the fusion pro- 
tein, and to employ a functional enzyme fragment at the amino terminus. As the 
25 HCV protease is normally expressed within a large polyprotein, it is not expected 
to indude cell transport signals (e.g^ export or secretion signals). Suitable func- 
tional enzyme fragments are those polypeptides which exhibit a quantifiable activ- 
ity when expressed fused to the HCV protease. Exemplary enzymes include, with- 
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out limitation, fi-galactosidase (B-gal), B-lactamase, horseradish peroxidase (HRP), 
glucose oxidase (GO), human superoxide dismutase (hSOD), urease, and the like. 
These enzymes are convenient because the amount of fusion protein produced 
can be quantified by means of simple colorimetric assays. Alternatively, one may 
5 employ antigenic proteins or fragments, to permit simple detection and quantifica- 
tion of* fusion proteins using antibodies specific for the fusion partner. The pres- 
ently preferred fusion partner is hSOD. 

B. General Method 

10 The practice of the present invention generally employs conven- 

tional techniques of molecular biology, microbiology, recombinant DNA, and 
immunology, which are within the skill of the art. Such techniques are explained 
fully in the literature. See for example J. Sambrook et al v "Molecular Cloning; A 
Laboratory Manual (1989); "DNA Cloning", Vol. I and II (D.N Glover ed. 1985); 

15 "Oligonucleotide Synthesis" (MJ. Gait ed, 1984); "Nucleic Acid Hybridization" 
(BIX Hames & SJ. Higgins eds. 1984); Transcription And Translation" (B.D. 
Hames & SJ, Higgins eds. 1984); "Animal Cell Culture" (RJ. Freshney ed. 1986); 
"Immobilized Cells And Enzymes" (IRL Press, 1986); B. Perbal, "A Practical 
Guide To Molecular Cloning" (1984); the series, "Methods In Enzymology" 

20 (Academic Press, Inc.); "Gene Transfer Vectors For Mammalian Cells" (J.H. 
Miller and M.P. Calos eds. 1987, Cold Spring Harbor Laboratory); Meth 
Enamfll (1987)154 andi55 (Wu and Grossman, and Wu, eds., respectively); 
Mayer & Walker, eds. (1987), "Immunochemical Methods In Cell And Molecular 
Biology" (Academic Press, London); Scopes, "Protein Purification: Principles And 

25 Practice", 2nd Ed (Springer- Verlag, N.Y., 1987); and "Handbook Of Experimental 
Immunology", volumes MV (Weir and Blackwell, eds, 1986). 

Both prokaryotic and eukaryotic host cells are useful for expressing 
desired coding sequences when appropriate control sequences compatible with the 
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designated host are used. Among prokaryotic hosts, E. coli is most frequently 
used. Expression control sequences for prokaryotes include promoters, optionally 
containing operator portions, and ribosome binding sites. Transfer vectors com- 
patible with prokaryotic hosts are commonly derived from, for example, pBR322, 
5 a plasmid containing operons conferring ampicillin and tetracycline resistance, and 
the various pUC vectors, which also contain sequences conferring antibiotic resist- 
ance markers. These plasmids are commercially available. The markers may be 
used to obtain successful transformants by selection. Commonly used prokaryotic 
control sequences include the fi-lactamase (penicillinase) and lactose promoter 

10 systems (Chang et al, £aiui£ (1977) 128:1056), the tryptophan (trp) promoter sys- 
tem (Goeddel et al. Nuc Adds B« (1980)H:4057) and the lambda-derived P, 
promoter and N gene ribosome binding site (Shimatake et al, Nature (1981) 
222:128) and the hybrid lac promoter (De Boer et al, Proc Nat A«id Sri USA 
(1983) 222:128) derived from sequences of the Ieq andJac UV5 promoters. The 

15 foregoing systems are particularly compatible with £ cott\ if desired, other pro- 
karyotic hosts such as strains of Bacillus or Pseudomonas may be used, with cor- 
responding control sequences. 

Eukaryotic hosts include without limitation yeast and mammalian 
cells in culture systems. Yeast expression hosts include Sacchawmyces, Klebsiella, 

20 Pida, and the like. SaccJiaromyces cerevisiae and Saccharomyces carkbergensis and 
K lactis are the most commonly used yeast hosts, and are convenient fungal hosts. 
Yeast-compatible vectors carry markers which permit selection of successful trans- 
formants by conferring prototrophy to auxotrophic mutants or resistance to heavy 
metals on wild-type strains. Yeast compatible vectors may employ the 2/i origin 

25 of replication (Broach et al, Meth Ensvmni ( 1983) Jfll:307), the combination of 
CEN3 and ARS1 or other means for assuring replication, such as sequences which 
will result in incorporation of an appropriate fragment into the host cell genome. 
Control sequences for yeast vectors are known in the art and include promoters 
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for the synthesis of glycolytic enzymes (Hess et al, J Adv Enzvme Reg (1968) 2: 
149; Holland et al, fiiflfihfim (1978), 12:4900), including the promoter for 3-phos- 
phoglycerate kinase (R. Hitzeman et al, J Biol Chem (1980) 255:2073). Termin- 
ators may also be included, such as those derived from the enolase gene (Holland, 
J Biol Chem (1981) 256:1385). Particularly useful control systems are those which 
comprise the glyceraldehyde-3 phosphate dehydrogenase (GAPDH) promoter or 
alcohol dehydrogenase (ADH) regulatable promoter, terminators also derived 
from GAPDH, and if secretion is desired, a leader sequence derived from yeast ce- 
faclor (see U.S. Pat No. 4,870,008, incorporated herein by reference). 

A presently preferred expression system employs the ubiquitin 
leader as the fusion partner. 

Yeast ubiquitin provides a 76 amino acid polypeptide which is automat- 
ically cleaved from the fused protein upon expression. The ubiquitin amino acid 
sequence is as follows: 

Gin lie Phe Val Lys Thr Leu Thr Gly Lys Thr lie Thr 
Leu Glu Val Glu Ser Ser Asp Thr Ee Asp Asn Val 
Lys Ser Lys lie Gin Asp Lys Glu Gly lie Pro Pro Asp 
Gin Gin Arg Leu lie Phe Ala Gly Lys Gin Leu Glu 
Asp Gly Arg Thr Leu Ser Asp Tyr Asn He Gin Lys 
Glu Ser Thr Leu His Leu Val Leu Arg Leu Arg Gly 
Gly 

See also Ozkaynak et aL Nature (1984) 112:663-66. Polynucleotides 
encoding the ubiquitin polypeptide may be synthesized by standard methods, for 
example following the technique of Barr et aL J Biol Chem (1988)263:1671-78 
using an Applied Biosystem 380A DNA synthesizer. Using appropriate linkers, 
the ubiquitin gene may be inserted into a suitable vector and ligated to a 
sequence encoding the HCV protease or a fragment thereof! 



PATENT 
0100.100 



2079105 

- 15- 



In addition, the transcriptional regulatory region and the transcrip- 
tional initiation region which are operably linked may be such that they are not 
naturally associated in the wild-type organism. These systems are described in 
detail in EPO 120,551, published October 3, 1984; EPO 116,201, published 
5 August 22, 1984; and EPO 164,556, published December 18, 1985, all of which 
are commonly owned with the present invention* 

Mammalian cell lines available as hosts for expression are known in 
the art and include many immortalized cell lines available from the American 

10 Type Culture Collection (ATCC), including HeLa cells, Chinese hamster ovary 
(CHO) cells, baby hamster kidney (BHK) cells, and a number of other cell lines. 
Suitable promoters for mammalian cells are also known in the art and include 
viral promoters such as that from Simian Virus 40 (SV40) (Fiers et al, Nature 
(1978)22:113), Rous sarcoma virus (RSV), adenovirus (ADV), and bovine papil- 

15 loma virus (BPV). Mammalian cells may also require terminator sequences and 
poly-A addition sequences. Enhancer sequences which increase expression may 
also be included, and sequences which promote amplification of the gene may 
also be desirable (for example methotrexate resistance genes). These sequences 
are known in the art 

20 Vectors suitable for replication in mammalian cells are known in the 

art, and may include viral replicons, or sequences which insure integration of the 
appropriate sequences encoding HCV epitopes into the host genome. For 
example, another vector used to express foreign DNA is Vaccinia virus. In this 
case the heterologous DNA is inserted into the Vaccinia genome. Techniques for 

25 the insertion of foreign DNA into the vaccinia vims genome are known in the art, 
and may utilize, for example, homologous recombination. The heterologous DNA 
is generally inserted into a gene which is non-essential to the virus, for example, 
the thymidine kinase gene (&), which also provides a selectable marker. Plasmid 
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vectors that greatly facilitate the construction of recombinant viruses have been 
described (see, for example, Mackett et al, J Virol (1984) 42:857; Chakrabarti et 
tL Mnl Cell Biol (1985) £3403; Moss, in GENE TRANSFER VECTORS FOR 
MAMMALIAN CELLS (Miller and Calos, eds., Cold Spring Harbor Laboratory, 

S NY, 1987), p. 10). Expression of the HCV polypeptide then occurs in cells or ani- 
mals which are infected with the live recombinant vaccinia virus. 

In order to detect whether or not the HCV polypeptide is expressed 
from the vaccinia vector, BSC 1 cells may be infected with the recombinant vector 
and grown on microscope slides under conditions which allow expression. The 

10 cells may then be acetone-fixed, and immunofluorescence assays performed using 
serum which is known to contain anti-HCV antibodies to a polypeptide(s) 
encoded in the region of the HCV genome from which the HCV segment in the 
recombinant expression vector was derived. 

Other systems for expression of eukaryotic or viral genomes include 

15 insect cells and vectors suitable for use in these cells. These systems are known in 
the art, and include, for example, insect expression transfer vectors derived from 
the baculovirus Autographa californica nuclear polyhedrosis virus (AcNPV), which 
is a helper-independent, viral expression vector. Expression vectors derived from 
this system usually use the strong viral polyhedrin gene promoter to drive 

20 expression of heterologous genes. Currently the most commonly used transfer 
vector for introducing foreign genes into AcNPV is pAc373 (see PCT W089/ 
045599 ). Many other vectors known to those of skill in the 

art have also been designed for improved expressioa These include, for example, 
pVL985 (which alters the polyhedrin start codon from ATG to ATT, and intro- 

25 duces a BamHI cloning site 32 bp downstream from the ATT; See Luckow and 
Summers, Virol (1989) JL2:31). AcNPV transfer vectors for high level expression 
of nonfused foreign proteins are described in copending applications PCT 
WO89/046699 and USSN 7/456,637. A unique BamHI site is located following 
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position -8 with respect to the translation initiation codon ATG of the polyhedrin 
gene. There are no cleavage sites for Smal, PstI, Bgin, Xbal or Sstl. Good 
expression of nonfused foreign proteins usually requires foreign genes that ideally 
have a short leader sequence containing suitable translation initiation signals pre- 

5 ceding an ATG start signal. The plasmid also contains the polyhedrin polyadenyl- 
ation signal and the ampidllin-resistance (amfi) gene and origin of replication for 
selection and propagation in E. coli. 

Methods for the introduction of heterologous DNA into the desired 
site in the baculovinis virus are known in the art. (See Summer and Smith, Texas 

.0 Agricultural Experiment Station Bulletin No, 1555; Smith et al f Mol Cell Rinl 
(1983)2:2156-2165; and Luckow and Summers, Virol (1989) JLZ^l). For example, 
the heterologous DNA can be inserted into a gene such as the polyhedrin gene by 
homologous recombination, or into a restriction enzyme site engineered into the 
desired baculovinis gene. The inserted sequences may be those which encode all 

.5 or varying segments of the polyprotein, or other oris which encode viral polypep- 
tides. For example, the insert could encode the following numbers of amino acid 
segments from the polyprotein: amino acids 1-1078; amino acids 332-662; amino 
acids 406-662; amino adds 156-328, and amino adds 199-328. 

The signals for post-translational modifications, such as signal pep- 

0 tide deavage, proteolytic deavage, and phosphorylation, appear to be recognized 
by insect cells. The signals required for secretion and nuclear accumulation also 
appear to be conserved between the invertebrate cells and vertebrate cells. 
Examples of the signal sequences from vertebrate cells which are effective in 
invertebrate cells are known in the art, for example, the human interleukin-2 sig- 

5 nal (IL2 S ) which signals for secretion from the cell f is recognized and properly 
removed in insect cells. 

Transformation may be by any known method for introdudng poly- 
nucleotides into a host cell, including, for example packaging the polynucleotide in 
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a virus and transducing a host cell with the vims, and by direct uptake of the poly- 
nucleotide. The transformation procedure used depends upon the host to be 
transformed. Bacterial transformation by direct uptake generally employs treat- 
ment with calcium or rubidium chloride (Cohen, ProcNat A«.ri srif^ (1972) 
12:2110; T. Maniatis et al, "Molecular Cloning; A Laboratory Manual" (Cold 
Spring Harbor Press, Cold Spring Harbor, NY, 1982). Yeast transformation by 
direct uptake may be carried out using the method of Hinnen et al, ProcNat 
Acad Sc j USA (1978)15:1929. Mammalian transformations by direct uptake may 
be conducted using the calcium phosphate precipitation method of Graham and 
Van der Eb,2fliBl (1978)52:546, or the various known modifications thereof. 
Other methods for introducing recombinant polynucleotides into cells, particularly 
into mammalian cells, include dextran-mediated transfection, calcium phosphate 
mediated transfection, polybrene mediated transfection, protoplast fusion, electro- 
poration, encapsulation of the polynucleotide^) in liposomes, and direct micro- 
injection of the polynucleotides into nuclei. 

Vector construction employs techniques which are known in the art. 
Site-specific DNA cleavage is performed by treating with suitable restriction 
enzymes under conditions which generally are specified by the manufacturer of 
these commercially available enzymes. In general, about 1 M g of plasmid or DNA 
sequence is cleaved by 1 unit of enzyme in about 20 ftL buffer solution by incuba- 
tion for 1-2 hr at 37°C After incubation with the restriction enzyme, protein is 
removed by phenol/chloroform extraction and the DNA recovered by precipita- 
tion with ethanol. The cleaved fragments may be separated using polyacrylamide 
or agarose gel electrophoresis techniques, according to the general procedures 
described in Meth Enzvmnl (1980) j65j499-560. 

Sticky-ended cleavage fragments may be blunt ended using E coli 
DNA polymerase I (Klenow fragment) with the appropriate deoxynucleotide tri- 



WO 91/15575 



PCT/US91/02210 



- 19- 



phosphates (dNTPs) present in the mixture. Treatment with SI nuclease may also 
be used, resulting in the hydrolysis of any single stranded DNA portions. 

Ligations are carried out under standard buffer and temperature 
conditions using T4 DNA ligase and ATP; sticky end ligations require less ATP 
5 and less ligase than blunt end ligations. When vector fragments are used as part 
of a ligation mixture, the vector fragment is often treated with bacterial alkaline 
phosphatase (BAP) or calf intestinal alkaline phosphatase to remove the S'-phos- 
phate, thus preventing religation of the vector. Alternatively, restriction enzyme 
digestion of unwanted fragments can be used to prevent ligation. 

10 Ligation mixtures are transformed into suitable cloning hosts, such 

as £ coli, and successful transformants selected using the markers incorporated 
(e.g^ antibiotic resistance), and screened for the correct construction. 

Synthetic oligonucleotides may be prepared using an automated 
oligonucleotide synthesizer as described by Warner, DNA (1984) 2:401. If 

15 desired, the synthetic strands may be labeled with *P by treatment with polynuc- 
leotide kinase in the presence of S P-ATP under standard reaction conditions. 

DNA sequences, including those isolated from cDNA libraries, may 
be modified by known techniques, for example by site directed mutagenesis (see 
e.&, Zoller, Nuc Adds Res (1982) 1&6487). Briefly, the DNA to be modified is 

20 packaged into phage as a single stranded sequence, and converted to a double 
stranded DNA with DNA polymerase, using as a primer a synthetic oligonucleo- 
tide complementary to the portion of the DNA to be modified, where the desired 
modification is included in the primer sequence. The resulting double stranded 
DNA is transformed into a phage-supporting host bacterium. Cultures of the 

25 transformed bacteria which contain copies of each strand of the phage are plated 
in agar to obtain plaques. Theoretically, 50% of the new plaques contain phage 
having the mutated sequence, and the remaining 50% have the original sequence. 
Replicates of the plaques are hybridized to labeled synthetic probe at temper- 
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atures and conditions which permit hybridization with the correct strand, but not 
with the unmodified sequence. The sequences which have been identified by 
hybridization are recovered and cloned. 

DNA libraries may be probed using the procedure of Grunstein and 
5 Hognes« Proc Nat Acad Sri USA (1975) 22:3961. Briefly, in this procedure the 
DNA to be probed is immobilized on nitrocellulose filters* denatured, and pre- 
hybridized with a buffer containing 0-50% formamide, 0.7S M Nad, 75 mM Na 
citrate, 0.02% (wt/v) each of bovine serum albumin, polyvinylpyrrolidone, and 
Ficoll* 50 mM NaH 2 P0 4 (pH 6.5), 0.1% SDS, and 100 /ig/mL carrier denatured 

10 DNA. The percentage of formamide in the buffer, as well as the time and tem- 
perature conditions of the prehybridization and subsequent hybridization steps 
depend on the stringency required. Oligomeric probes which require lower strin- 
gency conditions are generally used with low percentages of formamide, lower 
temperatures, and longer hybridization times. Probes containing more than 30 or 

15 40 nucleotides, such as those derived from cDNA or genomic sequences generally 
employ higher temperatures, eg., about 4042*0, and a high percentage formam- 
ide, e.gn 50%. Following prehybridization, y-^P-labeled oligonucleotide probe is 
added to the buffer, and the filters are incubated in this mixture under 
hybridization conditions. After washing, the treated filters are subjected to 

20 autoradiography to show the location of the hybridized probe; DNA in corres- 
ponding locations on the original agar plates is used as the source of the desired 
DNA. 

For routine vector constructions, ligation mixtures are transformed 
into EL coli strain HB101 or other suitable hosts, and successful transformants sel- 
25 ected by antibiotic resistance or other markers. Plasmids from the transformants 
are then prepared according to the method of Qewell et al, Proc Nat Acad Sci 
USA (1969) £2:1159, usually following chloramphenicol amplification (Qewell, i 
EaflClifli (1972) JiQ:667). The DNA is isolated and analyzed, usually by restric- 
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tion enzyme analysis and/or sequencing. Sequencing may be performed by the 
dideoxy method of Sanger et al, Proc Nat Acad Sri USA (1977) 14:5463, as fur- 
ther described by Messing et al, Nuc Adds Res (1981) 5:309, or by the method of 
Maxam et al Meth Enzvmnl (1980) £5:499. Problems with band compression, 
5 which are sometimes observed in GC-rich regions, were overcome by use of T- 
deazoguanosine according to Barr et aL Biotechniqucs (1986) 4:428. 

The enzyme-linked immunosorbent assay (ELISA) can be used to 
measure either antigen or antibody concentrations. This method depends upon 
conjugation of an enzyme to either an antigen or an antibody, and uses the bound 
10 enzyme activity as a quantitative label. To measure antibody, the known antigen 
is fixed to a solid phase (e.g., a microtiter dish, plastic cup, dipstick, plastic bead, 
or the like), incubated with test serum dilutions, washed, incubated with anti- 
immunoglobulin labeled with an enzyme, and washed again. Enzymes suitable for 
labeling are known in the art, and include, for example, horseradish peroxidase 
(HRP). Enzyme activity bound to the solid phase is usually measured by adding a 
specific substrate, and determining product formation or substrate utilization 
colorimetrically. The enzyme activity bound is a direct function of the amount of 
antibody bound. 

To measure antigen, a known specific antibody is fixed to the solid 
phase, the test material containing antigen is added, after an incubation the solid 
phase is washed, and a second enzyme-labeled antibody is added. After washing, 
substrate is added, and enzyme activity is measured colorimetrically, and related 
to antigen concentration. 

Proteases of the invention may be assayed for activity by cleaving a 
substrate which provides detectable cleavage products. As the HCV protease is 
believed to cleave itself from the genomic polyprotein, one can employ this auto- 
catalytic activity both to assay expression of the protein and determine activity. 
For example, if the protease is joined to its fusion partner so that the HCV pro- 
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tease N-terminal cleavage signal (Arg-Arg) is included, the expression product will 
cleave itself into fusion partner and active HCV protease. One may then assay 
the products, for example by western blot, to verify that the proteins produced 
correspond in sue to the separate fusion partner and protease proteins. It is pres- 
5 ently preferred to employ small peptide p-nitrophenyl esters or methylcoumarins, 
as cleavage may then be followed by spectrophotometry or fluorescent assays. 
Following the method described by ED. Matayoshi et al, .Science (1990)212:231- 
35, one may attach a fluorescent label to one end of the substrate and a quench- 
ing molecule to the other end: cleavage is then determined by measuring the 

10 resulting increase in fluorescence. If a suitable enzyme or antigen has been 

employed as the fusion partner, the quantity of protein produced may easily be 
determined. Alternatively, one may exclude the HCV protease N-terminal cleav- 
age signal (preventing self-cleavage) and add a separate cleavage substrate, such 
as a fragment of the HCV NS3 domain including the native processing signal or a 

15 synthetic analog. 

In the absence of this protease activity, the HCV polyprotein should 
remain in its unprocessed form, and thus render the virus noninfectious. Thus, 
the protease is useful for assaying pharmaceutical agents for control of HCV, as 
compounds which inhibit the protease activity sufficiently will also inhibit viral 

20 infectivity. Such inhibitors may take the form of organic compounds, particularly 
compounds which mimic the cleavage site of HCV recognized by the protease. 
Three of the putative cleavage sites of the HCV polyprotein have the following 
amino acid sequences: 



25 



Val-Ser-Ala-Arg-Arg // Gly-Arg-Glu-Ile-Leu-Leu-GIy 
Ala-Ile-Leu-Arg-Arg // His-Val-Gly-Pro- 
Val-Ser-Cys-Gln-Arg // Gly-Tyr- 
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These sites are characterized by the presence of two basic amino 
acids immediately before the cleavage site, and are similar to the cleavage sites 
recognized by other flavivirus proteases. Thus, suitable protease inhibitors may be 
prepared which mimic the basic/basic/small neutral motif of the HCV cleavage 
sites, but substituting a nonlabile linkage for the peptide bond cleaved in the 
natural substrate. Suitable inhibitors include peptide trifluoromethyl ketones, 
peptide boronic adds, peptide o-ketoesters, peptide difluoroketo compounds, pep- 
tide aldehydes, peptide diketones, and the like. For example, the peptide alde- 
hyde N-acetyl-phenylalanyl-glycinaldehyde is a potent inhibitor of the protease 
papain. One may conveniently prepare and assay large mixtures of peptides using 
the methods disclosed in 

PCT WO89/10931. This 
application teaches methods for generating mixtures of peptides up to hexapep- 
tides having all possible amino acid sequences, and further teaches assay methods 
for identifying those peptides capable of binding to proteases. 

Other protease inhibitors may be proteins, particularly antibodies 
and antibody derivatives. Recombinant expression systems may be used to gener- 
ate quantities of protease sufficient for production of monoclonal antibodies 
(MAbs) specific for the protease. Suitable antibodies for protease inhibition will 
bind to the protease in a manner reducing or eliminating the enzymatic activity, 
typically by obscuring the active site. Suitable MAbs may be used to generate 
derivatives, such as Fab fragments, chimeric antibodies, altered antibodies, unival- 
ent antibodies, and single domain antibodies, using methods known in the art. 

Protease inhibitors are screened using methods of the invention. In 
general, a substrate is employed which mimics the enzyme's natural substrate, but 
which provides a quantifiable signal when cleaved. The signal is preferably 
detectable by colorimetric or fluorometric means: however, other methods such 
as HPLC or silica gel chromatography, GOMS, nuclear magnetic resonance, and 
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the like may also be useful. After optimum substrate and enzyme concentrations 
are determined, a candidate protease inhibitor is added to the reaction mixture at 
a range of concentrations. The assay conditions ideally should resemble the con- 
ditions under which the protease is to be inhibited in vivo, i.e., under physiologic 
5 pH, temperature, ionic strength, etc Suitable inhibitors will exhibit strong pro- 
tease inhibition at concentrations which do not raise toxic side effects in the sub- 
ject Inhibitors which compete for binding to the protease active site may require 
concentrations equal to or greater than the substrate concentration, while inhib- 
itors capable of binding irreversibly to the protease active site may be added in 

10 concentrations on the order of the enzyme concentration. 

In a presently preferred embodiment, an inactive protease mutein is 
employed rather than an active enzyme. It has been found that replacing a 
critical residue within the active site of a protease (e.g., replacing the active site 
Ser of a serine protease) does not significantly alter the structure of the enzyme, 

15 and thus preserves the binding specificity. The altered enzyme still recognizes and 
binds to its proper substrate, but fails to effect cleavage. Thus, in one method of 
the invention an inactivated HCV protease is immobilized, and a mixture of can- 
didate inhibitors added. Inhibitors that closely mimic the enzyme's preferred 
recognition sequence will compete more successfully for binding than other candi- 

20 date inhibitors. The poorly-binding candidates may then be separated, and the 
identity of the strongly-binding inhibitors determined. For example, HCV prot- 
ease may be prepared substituting Ala for Ser ai (Fig. 1), providing an enzyme 
capable of binding the HCV protease substrate, but incapable of cleaving it. The 
resulting protease mutein is then bound to a solid support, for example Sephadex* 

25 beads, and packed into a column. A mixture of candidate protease inhibitors in 
solution is then passed through the column and fractions collected. The last frac- 
tions to elute will contain the strongest-binding compounds, and provide the pre- 
ferred protease inhibitor candidates. 
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Protease inhibitors may be administered by a variety of methods, 
such as intravenously, orally, intramuscularly, intraperitoneally, bronchially, intra* 
nasally, and so forth. The preferred route of administration will depend upon the 
nature of the inhibitor. Inhibitors prepared as organic compounds may often be 
5 administered orally (which is generally preferred) if well absorbed. Protein-based 
inhibitors (such as most antibody derivatives) must generally be administered by 
parenteral routes. 



C Examples 

10 The examples presented below are provided as a further guide to 

the practitioner of ordinary skill in the art, and are not to be construed as limiting 
the invention in any way. 

Example 1 

15 (Preparation of HCV cDNA) 

A genomic library of HCV cDNA was prepared as described in PCT 
WO89/046699 . Thi* library, ATCC accession no. 40394, has 

been deposited as set forth below. 



20 Example 2 

(Expression of the Polypeptide Encoded in Clone 5-1-1.) 
(A) The HCV polypeptide encoded within clone 5*1-1 (see 
Example 1) was expressed as a fusion polypeptide with human superoxide dis- 
mutase (SOD). This was accomplished by subcloning the clone 5-1-1 cDNA insert 
25 into the expression vector pSODCFl (K.S. Steimer et al, J Virol (1986) £&9; 

EPO 138,111) as follows. The SOD/5-1-1 expression vector was transformed into 
E. coli D1210 cells. These cells, named Cfl/5-1-1 in EL coli, were deposited as set 
forth below and have an ATCC accession no. of 67967. 



WO 91/15575 2079105 PCT/US91/02V 



-26- 



First, DNA isolated from pSODCFl was treated with BamHI and 
EcoRI, and the following linker was ligated into the linear DNA created by the 
restriction enzymes: 

GAT OCT GGA ATT CTG ATA AG A CCT TAA G AC TAT TIT AA 

5 

After cloning, the plasmid containing the insert was isolated* 

Plasmid containing the insert was restricted with EcoRI. The HCV 
cDNA insert in clone 5-1-1 was excised with EcoRI, and ligated into this EcoRI 
linearized plasmid DNA The DNA mixture was used to transform £L coli strain 

5 D1210 (Sadler et al, fifillfi (1980) £:279). Recombinants with the 5-1-1 cDNA in 
the correct orientation for expressing the ORF shown in Figure 1 were identified 
by restriction mapping and nucleotide sequencing. 

Recombinant bacteria from one clone were induced to express the 
SOD-HCV 5 ^j polypeptide by growing the bacteria in the presence of IPTG. 

10 Three separate expression vectors, pcflAB, pcflCD, and pcflEF 

were created by ligating three new linkers, AB, CD, and EF to a BamHI-EcoRI 
fragment derived by digesting to completion the vector pSODCFl with EcoRI and 
BamHI, followed by treatment with alkaline phosphatase. The linkers were 
created from six oligomers, A, B, C, D, E, and F. Each oligomer was phosphpryl- 

15 ated by treatment with kinase in the presence of ATP prior to annealing to its 

complementary oligomer. The sequences of the synthetic linkers were the follow- 
ing: 
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10 



15 



20 



A 
B 


GATC 


CTG 
GAC 


AAT 
TTA 


TCC 
AGG 


TGA 
ACT 


TAA 

ATT TTA A 


DO 


GATC 


CGA 
GCT 


ATT 
TAA 


CTG 
GAC 


TGA 
ACT 


TAA 

ATT TTA A 


E 
F 


GATC 


CTG 
GAC 


GAA 
CTT 


TTC 
AAG 


TGA 
ACT 


TAA 

ATT TTA A 



Each of the three linkers destroys the original EcoRI site, and 
creates a new EcoRI site within the linker, but within a different reading frame. 
Thus, the HCV cDNA EcoRI fragments isolated from the clones, when inserted 
5 into the expression vector, were in three different reading frames. 

The HCV cDNA fragments in the designated gtl 1 clones were 
excised by digestion with EcoRI; each fragment was inserted into pcflAB, 
pcflCD, and pcflEF. These expression constructs were then transformed into 
D1210£ coU cells, the transformants cloned, and polypeptides expressed as 
10 described in part B below. 

(B) Expression products of the indicated HCV cDNAs were 
tested for antigenicity by direct immunological screening of the colonies, using a 
modification of the method described in Helfman et al. ProcNat A raf ^|, ^ 
(1983), m\. Briefly, the bacteria were plated onto nitrocellulose filters overlaid 
on ampiriUin plates to give approximately 40 colonies per filter. Colonies were 
replica plated onto nitroceUulose filters, and the replicas were regrown overnight 
in the presence of 2 mM IPTG and ampidllin. The bacterial colonies were lysed 
by suspending the nitrocellulose filters for about 15 to 20 min in an atmosphere 
saturated with CHCI3 vapor. Each filter then was placed in an individual 100 mm 
Petri dish containing 10 mL of 50 mM Tris HC1, pH 7.5, 150 mM NaCl, 5 mM 
MgCl 2 , 3% (w/v) BSA, 40/ig/mL lysozyme, and 0.1 A g/mL DNase. The plates 
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were agitated gently for at least 8 hours at room temperature. The filters were 
rinsed in TBST (50 mM Tris HQ, pH 8.0, 150 mM NaCl, 0.005% Tween* 20). 
After incubation, the cell residues were rinsed and incubated for one hour in TBS 
(TBST without Tween*) containing 10% sheep serum. The filters were then 

5 incubated with pretreated sera in TBS from individuals with NANBH, which 
included 3 chimpanzees; 8 patients with chronic NANBH whose sera were pos- 
itive with respect to antibodies to HCV C100-3 polypeptide (also called C100); 8 
patients with chronic NANBH whose sera were negative for anti-ClOO antibodies; 
a convalescent patient whose serum was negative for anti-ClOO antibodies; and 6 

10 patients with community-acquired NANBH, including one whose sera was strongly 
positive with respect to anti-ClOO antibodies, and one whose sera was marginally 
positive with respect to anti-ClOO antibodies. The sera, diluted in TBS, was pre- 
treated by preabsorption with hSOD for at least 30 minutes at 37°C After incu- 
bation, the filters were washed twice for 30 min with TBST. The expressed pro- 

15 teins which bound antibodies in the sera were labeled by incubation for 2 hours 
with 123 I-labeled sheep anti-human antibody. After washing, the filters were 
washed twice for 30 min with TBST, dried, and autoradiographed. 

Example 3 

20 (Cloning of Full-Length SOD-Protease Fusion Proteins) 

(A) P BR322.C200: 

The nucleotide sequences of the HCV cDNAs used below were 
determined essentially as described above, except that the cDNA excised from 
these phages were substituted for the cDNA isolated from clone 5-1-1. 
25 Clone C33c was isolated using a hybridization probe having the fol- 

lowing sequence: 

5* ATC AGG ACC GGG GTG AGA ACA ATT ACC ACT 3* 



2079105 

WO M/15575 * v a PCT/US91/02210 



-29- 



The sequence of the HCV cDNA in done C33c is shown in Figure 8, which also 
shows the amino adds encoded therein. 

Clone 35 was isolated by screening with a synthetic polynucleotide 
having the sequence: 
5 5* AAG CCA CCG TGT GCG CTA GGG CTC AAG CCC 3' 

Approximately 1 in 50,000 dones hybridized with the probe. The polynucleotide 
and deduced amino add sequences for C35 are shown in Figure 7. 

Clone C31 is shown in Figure 6, which also shows the amino adds 
encoded therein. A C200 cassette was constructed by ligating together a 718 bp 
10 fragment obtained by digestion of clone C33c DNA with EcoRI and Hinfl, a 179 
bp fragment obtained by digestion of done C31 DNA with Hinfl and Bgll, and a 
377 bp fragment obtained by digesting done C35 DNA with Bgll and EcoRI. The 
construct of ligated fragments were inserted into the EcoRI site of pBR322, yield- 
ing the plasmid pBR322-C200. 
15 (B) C7f+C2Qc: 

Clone 7f was isolated using a probe having the sequence: 
5'-AGC AGA CAA GGG GCC TCC TAG GGT GCA TAA T-3* 
The sequence of HCV cDNA in done 7f and the amino adds encoded therein are 
shown in Figure 5. 

20 done C20c is isolated using a probe having the following sequence: 

5'-TGC ATC AAT GGG GTG TGC TGG-3* 
The sequence of HCV cDNA in done C20c, and the amino adds 
encoded therein are shown in Figure 2. 

Clones 7f and C20c were digested with EcoRI and SfaNI to form 
25 400 bp and 260 bp fragments, respectively. The fragments were then cloned into 
the EcoRI site of pBR322 to form the vector C7f+C20c and transformed into 
HB101 cells. 
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(Q 

done 8h was isolated using a probe based on the sequence of nuc- 
leotides in clone 33c The nucleotide sequence of the probe was 

S'-AGA GAC AAC CAT GAG GTC CCC GGT GTT C-3\ 
5 The sequence of the HCV cDNA in clone 8h, and the amino acids encoded 
therein, are shown in Figure 4. 

Clone C26d is isolated using a probe having the following sequence: 

5 f -CTG TTG TGC CCC GCG GCA GCC-3' 
The sequence and amino acid translation of clone C26d is shown in 

10 Figure 3. 

Clones C26d and C33c (see part A above) were transformed into 
the methylation minus £ coli strain GM48. Clone C26d was digested with 
EcoRII and Ddel to provide a 100 bp fragment Clone C33c was digested with 
EcoRII and EcoRI to provide a 700 bp fragment Gone C8h was digested with 
15 EcoRI and Ddel to provide a 208 bp fragment. These three fragments were then 
ligated into the EcoRI site of pBR322, and transformed into & coli HB101, to 
provide the vector C300. 

(D) Preparation of Full Length Clones: 

A 600 bp fragment was obtained from C7f+C20c by digestion with 
20 EcoRI and Nael, and ligated to a 945 bp Nael/EcoRI fragment from C300, and 
the construct inserted into the EcoRI site of pGEM4Z (commercially available 
from Promega) to form the vector C7fC20cC300. 

C7fC20cC300 was digested with Ndel and EcoRI to provide a 892 
bp fragment, which was ligated with a 1160 bp fragment obtained by digesting 
25 C200 with Ndel and EcoRI. The resulting construct was inserted into the EcoRI 
site of pBR322 to provide the vector C7fC20cC300C200. Construction of this vec- 
tor is illustrated schematically in Figure 9. 
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Example 4 

(Preparation of £ coli Expression Vectors) 

(A) tflSODpfflfr 

This vector contains a full-length HCV protease coding sequence 
5 fused to a functional hSOD leader. The vector C7fC20cC300C200 was cleaved 
with EcoRI to provide a 2000 bp fragment, which was then ligated into the EcoRI 
site of plasmid cflCD (Example 2A). The resulting vector encodes amino acids 1- 
151 of hSOD, and amino acids 946-1630 of HCV (numbered from the beginning 
of the polyprotein, corresponding to amino acids 1-686 in Figure 1). The vector 
10 was labeled cflSODp600 (sometimes referred to as P600), and was transformed 
into E. coli D1210 cells. These cells, ATCC accession no. 68275, were deposited 
as set forth below. 

(B) £120: 

A truncated SOD-protease fusion polynucleotide was prepared by 
15 excising a 600 bp EcoRI/Nael fragment from C7f+C20c blunting the fragment 
with Klenow fragment, ligating the blunted fragment into the Klenow-blunted 
EcoRI site of cflEF (Example 2A). This polynucleotide encodes a fusion protein 
having amino acids 1-151 of hSOD, and amino acids 1-199 of HCV protease. 

(C) £2QQ: 

20 A longer truncated SOD-protease fusion polynucleotide was pre- 

pared by excising an 892 bp EcoRI/Ndel fragment from C7fC20cC300, blunting 
the fragment with Klenow fragment, ligating the blunted fragment into the 
Klenow-blunted EcoRI site of cflEF. This polynucleotide encodes a fusion pro- 
tein having amino acids 1-151 of hSOD, and amino acids 1-299 of HCV protease. 

25 (D) £500: 

A longer truncated SOD-protease fusion polynucleotide was pre- 
pared by excising a 1550 bp EcoRI/EcoRI fragment from C7fC20cC300, and ligat- 
ing the fragment into the EcoRI site of cflCD to form P5O0. This polynucleotide 
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encodes a fusion protein having amino acids 1-151 of hSOD, and amino acids 946- 
1457 of HCV protease (amino acids 1*513 in Figure 1). 

(E) FLAG/Protease Fusion 

This vector contains a full-length HCV protease coding sequence 
5 fused to the FLAG sequence* Hopp et al. (1988) Biotechnology 6: 1204-1210. 
PCR was used to produce a HCV protease gene with special restriction ends for 
cloning ease. Plasmid p500 was digested with EcoRI and Ndel to yield a 900 bp 
fra gm ent- This fragment and two primers were used in a polymerase chain 
reaction to introduce a unique BgUI site at amino acid 1009 and a stop codon 
10 with a Sail site at amino acid 1262 of the HCV-1, as shown in Figure 17 of WO 
90/11089, published 4 October 1990. The sequence of the primers is as follows: 

5* CCC GAG CAA GAT CTC CCG GCC C 3' 
and 

5' CCC GGC TGC ATA AGC AGT CGA CTT GGA 3* 
15 After 30 cycles of PCR, the reaction was digested with BglH and Sail, and the 710 
bp fragment was isolated. This fragment was annealed and ligated to the 
following duplex: 

MetA«pTyrLy«A«pAopA»pAspLy«GlyArg01u 
20 CATCOACT ACAAA CACCATCAOCATAAACCCCCGCA 

CTGATQTTTCTGCTACTGCTATTTCCGOCCCTCTAG 

The duplex encodes the FLAG sequence, and initiator methionine, and a 5' Ncol 
restriction site. The resulting Ncol/Sall fragment was ligated into a derivative of 
25 pCFl. 

This construct is then transformed into E. coli D1210 cells and expression 
of the protease is induced by the addition of IPTG. 

The FLAG sequence was fused to the HCV protease to facilitate 
purification. A calcium dependent monoclonal antibody, which binds to the 
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FLAG encoded peptide, is used to purify the fusion protein without harsh eluting 
conditions. 

Example 5 

5 (£ coli Egression of SOD-Protease Fusion Proteins) 

(A) £ coli D1210 cells were transformed with cf lSODp600 and grown in 
Luria broth containing 100 /ig/mL ampicillin to an OD of 03-03. IPTG was then 
added to a concentration of 2 mM, and the cells cultured to a final OD of 0.9 to 
13. The cells were then lysed, and the lysate analyzed by Western blot using anti- 

10 HCV sera, as described in USSN 7/456,637. 

The results indicated the occurrence of cleavage, as no full length product 
(theoretical Mr 93 kDa) was evident on the gel. Bands corresponding to the 
hSOD fusion partner and the separate HCV protease appeared at relative mol- 
ecular weights of about 34, 53, and 66 kDa. The 34 kDa band corresponds to the 

15 hSOD partner (about 20 kDa) with a portion of the NS3 domain, while the 53 
and 66 kDa bands correspond to HCV protease with varying degrees of (possibly 
bacterial) processing. 

(B) £ coli D1210 cells were transformed with P500 and grown in Luria 
broth containing 100 Atg/mL ampicillin to an OD of 03-0.5. IPTG was then 

20 added to a concentration of 2 mM, and the cells cultured to a final OD of 0.8 to 
1.0. The cells were then lysed, and the lysate analyzed as described above. 

The results again indicated the occurrence of cleavage, as no full length 
product (theoretical Mr 73 kDa) was evident on the gel. Bands corresponding to 
the hSOD fusion partner and the truncated HCV protease appeared at molecular 

25 weights of about 34 and 45 kDa, respectively. 

(C) £ coli D1210 cells were transformed with vectors P300 and P190 
and grown as described above. 
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The results from P300 expression indicated the occurrence of cleavage, as 
no full length product (theoretical Mr 51 kDa) was evident on the gel. A band 
corresponding to the hSOD fusion partner appeared at a relative molecular 
weight of about 34. The corresponding HCV protease band was not visible, as 
5 this region of the NS3 domain is not recognized by the sera employed to detect 
the products. However, appearance of the hSOD band at 34 kDa rather than 51 
kDa indicates that cleavage occurred. 

The P190 expression product appeared only as the full (encoded) length 
product without cleavage, forming a band at about 40 kDa, which corresponds to 
10 the theoretical molecular weight for the uncleaved product This may indicate 
that the minimum essential sequence for HCV protease extends to the region 
between amino acids 199 and 299. 



Example 6 

15 (Purification of & coli Expressed Protease) 

The HCV protease and fragments expressed in Example 5 may be purified 
as follows: 

The bacterial cells in which the polypeptide was expressed are subjected to 
osmotic shock and mechanical disruption, the insoluble fraction containing the 

20 protease is isolated and subjected to differential extraction with an alkaline-NaCl 
solution, and the polypeptide in the extract purified by chromatography on 
columns of S-Sepharose* and Q-Sepharose*. 

The crude extract resulting from osmotic shock and mechanical disruption 
is prepared by suspending 1 g of the packed cells in 10 mL of a solution con- 

25 taining 0.02 M Tris HQ, pH 7.5, 10 mM EDTA, 20% sucrose, and incubating for 
10 minutes on ice. The cells are then pelleted by centrifiigation at 4,000 x g for 
15 min at 4°G. After the supernatant is removed, the cell pellets are resuspended 
in 10 mL of Buffer Al (0.01 M Tris HC1, pH 7.5, 1 mM EDTA, 14jnM B-mercap- 
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tocthanol - "BME"), and incubated on ice for 10 minutes. The cells are again 
pelleted at 4,000 x g for IS minutes at 4°G. After removal of the clear super- 
natant (periplasmic fraction I), the cell pellets are resuspended in Buffer Al, incu- 
bated on ice for 10 minutes, and again ccntrifuged at 4,000 x g for 15 minutes at 
5 4°G. The dear supernatant (periplasmic fraction II) is removed, and the cell 
pellet resuspended in 5 mL of Buffer T2 (0.02 M Tris HC1, pH 75, 14 mM BME, 
1 mM EDTA, 1 mM PMSF). In order to disrupt the cells, the suspension (5 mL) 
and 15 mL of Dyno-mill lead-free acid washed glass beads (0.10-0.15 mm diam- 
eter) (available from Glen-Mills, Inc.) are placed in a Falcon tube and vortexed at 

10 top speed for two minutes, followed by cooling for at least 2 min on ice. The 
vortexing-cooling procedure is repeated another four times. After vortexing, the 
slurry is filtered through a sintered glass funnel using low suction, the glass beads 
washed twice with Buffer A2, and the filtrate and washes combined. 

The insoluble fraction of the crude extract is collected by centrifugation at 

15 20,000 x g for 15 min at 4°Q washed twice with 10 mL Buffer A2, and resus- 
pended in 5 mL of MHU-Q water. 

A fraction containing the HCV protease is isolated from the insoluble 
material by adding to the suspension NaOH (2 M) and NaCl (2 M) to yield a 
final concentation of 20 mM each, vortexing the mixture for 1 minute, centrifuging 

20 it 20,000 x g for 20 min at 4°Q and retaining the supernatant 

The partially purified protease is then purified by SDS-PAGE. The pro- 
tease may be identified by western blot, and the band excised from the gel. The 
protease is then eluted from the band, and analyzed to confirm its amino acid 
sequence. N-terminal sequences may be analyzed using an automated amino acid 

25 sequencer, while C-terminal sequences may be analyzed by automated amino acid 

sequencing of a series of tiyptic fragments. 
^Trademark 
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Example 7 
(Preparation of Yeast Expression Vector) 

(A) P6S0 f SOD /Protease F,.«innl 

This vector contains HCV sequence, which includes the wild-type full- 
5 length HCV protease coding sequence, fused at the 5* end to a SOD coding 
sequence. Two fragments, a 441 bp EcoRI/BglH fragment from clone lib and a 
1471 bp BgUI/EcoRI fragment from expression vector P500, were used to 
reconstruct a wild-type, full-length HCV protease coding sequence. These two 
fragments were ligated together with an EcoRI digested pS356 vector to produce 

10 an expression cassette. The expression cassette encodes the ADH2/GAPDH 
hybrid yeast promoter, human SOD, the HCV protease, and a GAPDH 
transcription terminator. The resulting vector was digested with BamHI and a 
4052 bp fragment was isolated. This fragment was ligated to the BamHI digested 
pAB24 vector to produce p650. p650 expresses a polyprotein containing, from its 

15 amino terminal end, amino acids 1-154 of hSOD, an oligopeptide -Asn-Leu-Gly- 
Ile-Arg- , and amino acids 819 to 1458 of HCV-1, as shown in Figure 17 of WO 
90/11089, published 4 October 1990. 

Clone lib was isolated from the genomic library of HCV cDNA, ATCC 
accession no. 40394, as described above in Example 3A, using a hybridization 

20 probe having the following sequence: 

5' CAC CTA TGT TTA TAA CCA TCT CAC TCC TCT 3*. 
This procedure is also described in EPO Pub. No. 318 216, Example IV.A.17. 

The vector pS3EF, which is a pBR322 derivative, contains the 
ADH2/GAPDH hybrid yeast promoter upstream of the human superoxide 

25 dimutase gene, an adaptor, and a downstream yeast effective transcription 

terminator. A similar expression vector containing these control elements and the 
superoxide dismutase gene is described in Cousens et al. (1987) Gene 61: 265, and 
in copending application EPO 196,056, published October 1, 1986. pS3EF, 
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however, differs from that in Cousens et al. in that the heterologous proinsulin 
gene and the immunoglobulin hinge are deleted, and Gin,* of SOD is followed by 
an 

adaptor sequence which contains an EcoRI site. The sequence of the adaptor is: 

5 5' AAT TTO QOA ATT CCA TAA TTA ATT AAO 3' 

3' AC OCT TAA GOT ATT AAT TAA TTC ACCT 5* 

The EcoRI site facilitates the insertion of heterologous sequences. Once inserted 
into pS3EF, a SOD fusion is expressed which contains an oligopeptide that links 
SOD to the heterologous sequences. pS3EF is exactly the same as pS356 except 
10 that pS356 contains a different adaptor. The sequence of the adaptor is shown 
below: 

5* AAT TTO GOA ATT CCA TAA TOA G 3' 

3' AC OCT TAA GOT ATT ACT CAG CT 5* 

pS356, ATCC accession no. 67683, is deposited as set forth below. 

15 Plasmid pAB24 is a yeast shuttle vector, which contains pBR322 

sequences, the complete I41 sequence for DNA replication in yeast (Broach (1981) 
in: Molecular Biology of the Y««t Sarrharr^f-n Vol. 1, p. 445, Cold spring 
Harbor Press.) and the yeast LEU M gene derived from plasmid pCl/1, described 
in EPO Pub. No. 116 201. Plasmid pAB24 was constructed by digesting YEp24 

10 with EcoRI and re-ligating the vector to remove the partial 2 micron sequences. 
The resulting plasmid, YEp24deltaRI, was linearized with Clal and ligated with 
the complete 2 micron plasmid which had been linearized with Clal. The 
resulting plasmid, pCBou, was then digested with Xbal, and the 8605 bp vector 
fragment was gel isolated. This isolated Xbal fragment was ligated with a 4460 

!5 bp Xbal fragment containing the LEU 2 " gene isolated from pCl/1; the orientation 
of LEU* gene is in the same direction as the URA3 gene. 

S. cerevisae, 2150-2-3 (pAB24-GAP-env2), accession no. 20827, is 
deposited with the American Type Culture Collection as set forth below. The 
plasmid pAB24-GAP-env2 can be recovered from the yeast cells by known 
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techniques. The GAP-env2 expression cassette can be removed by digesting 
pAB24-GAP-env2 with BaraHI. pAB24 is recovered by religating the vector 
without the BamHI insert. 

S Sample 8 

(Yeast Expression of SOD-Protease Fusion Protein) 
p650 was transformed in S. cerevisae strain JSC310, Mata, leu2, 
ura3-52, prbl-1122, pep4-3, prcl-407, cir": DM15 (g418 resistance). The 
transformation is as described by Hinnen et ai. (1978) Proc Natl Acad Sci USA 
10 75: 1929. The transformed cells were selected on ura- plates with 8% glucose. 
The plates were incubated at 30°C for 4-5 days. The tranfonnants were further 
selected on leu- plates with 8% glucose putatively for high numbers of the p650 
plasmid. Colonies from the leu- plates were inoculated into leu* medium with 3% 
glucose. These cultures were shaken at 30°C for 2 days and then diluted 1/20 
15 into YEPD medium with 2% glucose and shaken for 2 more days at 30°C 

£ cemnsae JSC310 contains DM15 DNA, described in EPO Pub. 
No. 340 986, published 8 November 1989. This DM15 DNA enhances ADH2 
regulated expression of heterologous proteins. pDM15, accession no. 40453, is 
deposited with the American Type Culture Collection as set forth below. 

20 

Example 9 

(Yeast Ubiquitin Expression of Mature HCV Protease) 
Mature HCV protease is prepared by cleaving vector 
C7fC20cC300C200 with EcoRI to obtain a 2 Kb coding sequence, and inserting 
25 the sequence with the appropriate linkers into a ubiquitin expression vector, such 
as that described in WO 88/02406, published 7 April 1988, 

Mature HCV protease is 
recovered upon expression of the vector in suitable hosts, particularly yeast 
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Specifically, the yeast expression protocol described in Example 8 is used to 
express a ubiquitin/HCV protease vector. 

Example 10 

5 (Preparation of an In- Vitro Expression Vector) 

(A) riGEM».3Z/Vellow Fever Leader Vector 

Four synthetic DNA fragments were annealed and ligated" 
together to create a EcoRI/SacI Yellow Fever leader, which was ligated to a 
EcoRI/SacI digested pGEM+-3Z vector from Promega*. The sequence of the 
10 four fragments are listed below: 
YFK-1: 

5* AAT TCG TAA ATC CTG TGT GCT AAT TGA GGT GCA TTG GTC 

TGC AAA TCG AGT TGC TAG GCA ATA AAC ACA TT 3* 

YFK-2: 

15 5* TAT TGC CTA GCA ACT CGA TTT GCA GAC CAA TGC ACC TCA ATT 
AGC ACA CAG GAT TTA CG 3' 
YFK-3: 

5' TGG ATT AAT TTT AAT CGT TCG TTG AGC GAT TAG CAG AG A 
ACT GAC CAG AAC ATG TCT GAG CT 3' 
20 YFK-4: 

5' CAG ACA TGT TCT GGT CAG TTC TCT GCT AAT CGC TCA ACG AAC 
GAT TAA AAT TAA TCC AAA TGT GTT 3*. 

For in-vitro translation of the HCV protease, the new pGEM®- 
3Z/Yellow Fever leader vector was digested with BamHI and blunted with 
25 Klenow. 

(B) PvuT! Construct from pfiOOO 

A clone p6000 was constructed from sequences available from the 
genomic library of HCV cDNA, ATCC accession no. 40394. The HCV encoding 
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DNA sequence of p6000 is identical to nucleotide -275 to nucleotide 6372 of 
Figure 17 of WO 90/11089, published 4 October 1990. p6000 was digested with 
PvuII, and from the digest, a 2,864 bp fragment was isolated. This 2,864 bp 
fragment was ligated to the prepared pGEM*-3Z/Yellow Fever leader vector 
5 fragment, described above. 

Example 11 

(In-Vitro Expression of HCV Protease) 

(A) Transcription 

10 The pGEM*-3Z/Yeliow Fever leader/PvuII vector was linearized 

with Xbal and transcribed using the materials and protocols from Promega's 
Riboprobe* Gemini II Core system. 

(B) Translation 

The RNA produced by the above protocol was translated using 
15 Promega's rabbit reticulocyte lysate, minus methionine, canine pancreatic 

microsomal membranes, as well as, other necessary materials and instructions 
from Promega. 



20 Deposited Biological Materials: 

The following materials were deposited with the American Type 
Culture Collection (ATCC), 12301 Parklawn Dr., Rockville, Maryland: 

liamfi Deposit DatC Accession No. 

25 EL coli D1210, cflSODp600 23 Mar 1990 68275 

Cf 1/5-1-1 in E. coli D1210 11 May 1989 67967 

Bacteriophage A-gtll cDNA 01 Dec 1987 40394 
30 library 
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£ coli HB101, pS356 

plasmid DNA, pDMlS 

5. cerevisae, 2150-2-3 
(pAB24-GAP-env2) 



29 Apr 1988 
OS May 1988 

* 

23 Dec 1986 



67683 



40453 



20827 



The above materials have been deposited with the ATCC under the 
accession numbers indicated. These deposits will be maintained under the terms 
of the Budapest Treaty on the International Recognition of the Deposit of Micro- 
organisms for purposes of Patent Procedure. These deposits are provided as a 

convenience to those of skill in the art 

The polynucleotide sequences contained in the 

deposited materials, as well as the amino acid sequence of the polypeptides 
encoded thereby, are incorporated herein by reference and are controlling in the 
event of any conflict with the sequences described herein. A license may be 
required to make, use or sell the deposited materials, and no such license is 
granted hereby. 
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1 

Gly Thr 

ATT CGC GGC ACC 
TAA GCC CCG TGG 



5 

Tyr Val Tyr Asn 

TAT GTT TAT AAC 
ATA CAA ATA TTG 



10 

His Leu Thr Pro 
CAT CTC ACT CCT 
GTA GAG TGA GGA 



Leu Arg Asp Trp 
CTT CGG GAC TGG 
GAA GCC CTG ACC 



15 20 

Ala His Asn Gly Leu Arg Asp Leu 
GCG CAC AAC GGC TTG CGA GAT CTG 
CGC GTG TTG CCG AAC GCT CTA GAC 



25 30 
Ala Val Ala Val Glu Pro Val Val 
GCC GTG GCT GTA GAG CCA GTC GTC 
CGG CAC CGA CAT CTC GGT CAG CAG 



Phe Ser Gin Met 

TTC TCC CAA ATG 
AAG AGG GTT TAC 



35 

Glu Thr Lys Leu 

GAG ACC AAG CTC 
CTC TGG TTC GAG 



40 

lie Thr Trp Gly 

ATC ACG TGG GGG 
TAG TGC ACC CCC 



45 

Ala Asp Thr Ala 

GCA GAT ACC GCC 
CGT CTA TGG CGG 



50 

Ala Cys Gly Asp 
GCG TGC GGT GAC 
CGC ACG CCA CTG 



lie lie Asn Gly 
ATC ATC AAC GGC 
TAG TAG TTG CCG 



55 

Leu Pro Val Ser 
TTG CCT GTT TCC 
AAC GGA CAA AGG 



60 

Ala Arg Arg Gly 

GCC CGC AGG GGC 
CGG GCG TCC CCG 



65 70 

Arg Glu lie Leu Leu Gly Pro Ala 

CGG GAG ATA CTG CTC GGG CCA GCC 
GCC CTC TAT GAC GAG CCC GGT CGG 



75 

Asp Gly Met Val Ser Lys Gly Trp 
GAT GGA ATG GTC TCC AAG GGT TGG 
CTA CCT TAC CAG AGG TTC CCA ACC 



80 

Arg Leu Leu Ala 
AGG TTG CTG GCG 
TCC AAC GAC CGC 



85 

Pro lie Thr Ala 

CCC ATC ACG GCG 
GGG TAG TGC CGC 



90 

Tyr Ala Gin Gin 

TAC GCC CAG CAG 
ATG CGG GTC GTC 



Thr Arg Gly Leu 

ACA AGG GGC CTC 
TGT TCC CCG GAG 



95 100 
Leu Gly Cys lie lie Thr Ser Leu 
CTA GGG TGC ATA ATC ACC AGC CTA 
GAT CCC ACG TAT TAG TGG TCG GAT 



105 110 

Thr Gly Arg Asp Lys Asn Gin val 

ACT GGC CGG GAC AAA AAC CAA GTG 
TGA CCG CCC CTG TTT TTG GTT CAC 



Glu Gly Glu Val 
GAG GGT GAG GTC 
CTC CCA CTC CAG 



115 

Gin He Val Ser 

CAG ATT GTG TCA 
GTC TAA CAC AGT 



120 

Thr Ala Ala Gin 

ACT GCT GCC CAA 
TGA CGA CGG GTT 



125 

Thr Phe Leu Ala 

ACC TTC CTG GCA 
TGG AAG GAC CGT 



Figure 1 
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130 



Thr 

ACG 
TGC 


Cys 

TGC 
ACG 


He 
ATC 
TAG 


He 
ATC 
TAG 


Thr 

ACG 
TGC 


Arg 

AGG 
TCC 


145 
Thr 
ACC 
TGG 


He 
ATC 
TAG 


Asn 
AAT 
TTA 


160 
Val 
GTA 
CAT 


Asp 
GAC 
CTG 


Gin 
CAA 
GTT 


175 
Ser 
TCA 
AGT 


Leu 
TTG 
AAC 


Thr 
ACA 
TGT 


Pro 

CCC 
GGG 


Arg 

AGG 
TCC 


His 
CAC 
GTG 


Ala 

GCC 
CGG 


Asp 
GAT 
CTA 


Ser 
AGC 
TCG 


Leu 
CTG 
GAC 


Leu 
CTG 
GAC 


210 
Ser 
TCG 
AGC 


Gly 
GGT 
CCA 


Pro 
CCG 
GGC 


225 
Leu 
CTG 
GAC 


Leu 

TTG 
AAC 


Ala 

GCG 
CGC 


240 
Val 
GTG 
CAC 


Cys 

TGC 
ACG 


Thr 
ACC 
TGG 



Asn 
AAT 
TTA 


Gly 

GGG 
CCC 


val 

GTG 
CAC 


Cys 
TGC 
ACG 


Ala 

GCG 
CGC 


Ser 

TCA 
AGT 


Pro 

CCC 
GGG 


150 
Lys 
AAG 
TTC 


Asp 
GAC 
CTG 


Leu 

CTT 
GAA 


165 

Val 

GTG 
CAC 


Gly 
GGC 
CCG 


Cys 

TGC 
ACG 


180 

Thr 

ACT 
TGA 


Cys 
TGC 
ACG 


Gly 
GGC 
CCG 


195 

val 

GTC 
CAG 


He 
ATT 
TAA 


Pro 
CCC 
GGG 


Val 
GTG 
CAC 


Pro 
CCC 
GGG 


Arg 

CGG 
GCC 


Pro 
CCC 
GGG 


He 
ATT 
TAA 


Cys 

TGC 
ACG 


Pro 
CCC 
GGG 


Ala 

GCG 
CGC 


230 
Gly 
GGG 
CCC 


Arg 

CGT 
GCA 


Gly 
GGA 
CCT 


245 
Val 
GTG 
CAC 


Ala 

GCT 
CGA 
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135 



Trp Thr 
TGG ACT 
ACC TGA 


Val 

GTC 
CAG 


Tyr 

TAC 
ATG 


Gly Pro 
GGT CCT 
CCA GGA 


Val 
GTC 
CAG 


He 
ATC 
TAG 


Trp 

TGG 
ACC 


Pro 

CCC 
GGG 


Ala 
GCT 
CGA 


170 
Ser 
TCG 
AGC 


Ser 
TCC 

AGG 


Ser 
TCG 
AGC 


185 
Asp 
GAC 
CTG 


Leu 
CTT 
GAA 


Arg 

CGC 

GCG 
* 


200 
Arg 
CGG 
GCC 


Arg 
CGG 
GCC 


Gly 
GGT 
CCA 


Nael 






215 
Ser 
TCC 
AGG 


Tyr 
TAC 
ATG 


Leu 
TTG 
AAC 


Lys 
AAA 

TTT 

V 


His 
CAC 
GTG 


Ala 

GCC 
CGG 


Val 
GTG 
CAC 


Gly 
GGC 
CCG 


Lys 

AAG 

TTC 


Ala 

GCG 
CGC 


Val 
GTG 
CAC 


250 
Asp 
GAC 
CTG 



PCT/US91/02210 



His 
CAC 
GTG 


140 

Gly 

GGG 
CCC 


Ala 
GCC 
CGG 


Gly 

GGA 
CCT 


155 
Gin 
CAG 
GTC 


Met 
ATG 
TAC 


Tyr 
TAT 
ATA 


Thr 
ACC 
TGG 


Gin 
CAA 
GTT 


Gly 

GGT 
CCA 


Thr 
ACC 
TGG 


Arg 

CGC 
GCG 


Tyr 
TAC 
ATG 


Leu 
CTG 
GAC 


Val 
GTC 
CAG 


190 
Thr 
ACG 
TGC 


Asp 
GAT 
CTA 


Ser 
AGC 
TCG 


205 

m» V %J 

Arg 

AGG 
TCC 


Gly 
GGC 
CCG 


Gly 
GGC 
CCG 


220 
Ser 
TCC 
AGG 


Ser 
TCG 
AGC 


Gly 
GGG 
CCC 


235 
He 
ATA 
TAT 


Phe 
TTT 
AAA 


Arg 
AGG 
TCC 


Ala 

GCC 
CGG 


Phe 
TTT 
AAA 


He 
ATC 
TAG 


Pro 

CCT 
GGA 


Val 
GTG 
CAC 
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255 
Glu 
GAG 
CTC 


Asn 
AAC 

TTG 


Leu 
CTA 
GAT 


Glu 
GAG 
CTC 


Thr 

ACA 
TGT 


Thr 

ACC 
TGG 


Met 
ATG 
TAC 


Arg 

AGG 
TCC 


Ser 
TCC 
AGG 


Pro 
CCG 
GGC 


265 
Val 
GTG 
CAC 


Phe 
TTC 
AAG 


Thr 
ACG 
TGC 


Asp 
GAT 
CTA 


Asn 
AAC 

TTG 


*% ^ #\ 
270 

Ser 
TCC 
AGG 


Ser 

TCT 
AGA 


Pro 

CCA 
GGT 


Pro 
CCA 
GGT 


Val 
GTA 
CAT 


275 

Val 

GTG 
CAC 


Pro 
CCC 
GGG 


Gin 

CAG 
GTC 


Ser 
AGC 
TCG 


Phe 
TTC 
AAG 


280 

Gin 

CAG 
GTC 


Val 
GTG 
CAC 


Ala 

GCT 
CGA 


His 

CAC 
GTG 


Leu 

CTC 
GAG 


285 
His 
CAT 
GTA 


Ala 
GCT 
CGA 


Pro 
ccc 

GGG 


Thr 
ACA 
TGT 


Giy 

GGC 
CCG 


290 
ser 
AGC 
TCG 


uiy 
GGC 
CCG 


L>ys 
AAA 
TTT 


Caw 

ser 
AGC 
TCG 


Tnr 

ACC 
TGG 


295 

Lys 
AAG 
TTC 


val 

GTC 
CAG 


Pro 
CCG 
GGC 


Ala 
GCT 
CGA 


Ala 

GCA 
CGT 


300 
Tyr 
TAT 
ATA 


• 

Ala 

GCA 
CGT 


Ala 
GCT 
CGA 


























■ 

Ndel 






Gin Gly 

CAG GGC 
GTC CCG 


305 

Tyr 
TAT 
ATA 


Lys 

AAG 
TTC 


Val 
GTG 
CAC 


Leu 
CTA 
GAT 


val 

GTA 
CAT 


Leu 
CTC 
GAG 


Asn 
AAC 
TTG 


Pro 
ccc 

GGG 


Ser 

TCT 
AGA 


Val 

GTT 
CAA 


315 
Ala 
GCT 
CGA 


Ala 

GCA 
CGT 


Thr 
ACA 
TGT 


Leu 
CTG 
GAC 


Gly 

GGC 
CCG 


320 
Phe 
TTT 
AAA 


Gly Ala Tyr 

GGT GCT TAC 
CCA CGA ATG 


Net 
ATG 
TAC 


Ser 

TCC 
AGG 


Lys 
AAG 
TTC 


Ala 

GCT 
CGA 


His 
CAT 
GTA 


Gly 
GGG 
CCC 


330 

He 
ATC 
TAG 


Asp 
GAT 
CTA 


Pro 
CCT 
GGA 


Asn 
AAC 
TTG 


He 
ATC 
TAG 


335 
Arg 
AGG 
TCC 


Thr 
ACC 
TGG 


Gly Val 
GGG GTG 
CCC CAC 


Arg 

AGA 
TCT 


Thr 

ACA 
TGT 


He 
ATT 
TAA 


Thr Thr Gly 

ACC ACT GGC 
TGG TGA CCG 


1 A tt 

Ser 
AGC 
TCG 


Pro 
CCC 
GGG 


He 
ATC 
TAG 


Thr 

ACG 
TGC 


Tyr 
TAC 
ATG 


350 
Ser 
TCC 
AGG 


Thr Tyr 

ACC TAC 
TGG ATG 


Gly Lys 

GGC AAG 
CCG TTC 


355 

Phe 

TTC 
AAG 


Leu 
CTT 
GAA 


Ala 

GCC 
CGG 


360 

Asp Gly Gly 

GAC GGC GGG 
CTG CCG CCC 


Cys 

TGC 
ACG 


Ser 

TCG 
AGC 


Gly 
GGG 
CCC 


Gly 

GGC 
CCG 


365 
Ala 
GCT 
CGA 


Tyr 

TAT 
ATA 


Asp 
GAC 
CTG 


He 
ATA 
TAT 


lie 

ATA 
TAT 


370 
He 
ATT 
TAA 


Cys 

TGT 
ACA 


Asp 
GAC 
CTG 


Glu 
GAG 
CTC 


Cys 
TGC 
ACG 


375 
His 
CAC 
GTG 


Ser 
TCC 
AGG 


Thr 

ACG 
TGC 


Asp Ala 

GAT GCC 
CTA CGG 


380 
Thr 
ACA 
TGT 


Ser 
TCC 
AGG 


He 
ATC 
TAG 
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385 

Leu Gly lie Gly 

TTG GGC ATT GGC 
AAC CCG TAA CCG 



390 

Thr Val Leu Asp 

ACT GTC CTT GAC 
TGA CAG GAA CTG 



Gin Ala Glu Thr 

CAA GCA GAG ACT 
GTT CGT CTC TGA 



395 

Ala Gly Ala Arg 

GCG GGG GCG AGA 
CGC CCC CGC TCT 





400 






Leu 


Val 


Val 


Leu 


CTG 


GTT 


GTG 


CTC 


GAC 


CAA 


CAC 


GAG 



405 

Ala Thr Ala Thr 

GCC ACC GCC ACC 
CGG TGG CGG TGG 



410 

Pro Pro Gly Ser 

CCT CCG GGC TCC 
GGA GGC CCG AGG 



Val Thr Val Pro 

GTC ACT GTG CCC 
CAG TGA CAC GGG 



415 

His Pro Asn lie 
CAT CCC AAC ATC 
GTA GGG TTG TAG 



420 

Glu Glu Val Ala 
GAG GAG GTT GCT 
CTC CTC CAA CGA 



425 

Leu Ser Thr Thr 

CTG TCC ACC ACC 
GAC AGG TGG TGG 



430 

Gly Glu He Pro 
GGA GAG ATC CCT 
CCT CTC TAG GGA 



435 440 445 

Phe Tyr Gly Lys Ala He Pro Leu Glu Val He Lys Gly Gly Arg His 

TTT TAC GGC AAG GCT ATC CCC CTC GAA GTA ATC AAG GGG GGG AGA CAT 
AAA ATG CCG TTC CGA TAG GGG GAG CTT CAT TAG TTC CCC CCC TCT GTA 



450 

Leu lie Phe Cys 
CTC ATC TTC TGT 
GAG TAG AAG ACA 



His Ser Lys Lys 
CAT TCA AAG AAG 
GTA AGT TTC TTC 



455 

Lys Cys Asp Glu 

AAG TGC GAC GAA 
TTC ACG CTG CTT 



460 

Leu Ala Ala Lys 

CTC GCC GCA AAG 
GAG CGG CGT TTC 







465 




Leu 


val 


Ala 


Leu 


CTG 


GTC 


GCA 


TTG 


GAC 


CAG 


CGT 


AAC 



470 

Gly He Asn Ala 

GGC ATC AAT GCC 
CCG TAG TTA CGG 



Val Ala Tyr Tyr 
GTG GCC TAC TAC 
CAC CGG ATG ATG 



475 

Arg Gly Leu Asp 
CGC GGT CTT GAC 
GCG CCA GAA CTG 





480 






Val 


Ser 


val 


He 


GTG 


TCC 


GTC 


ATC 


CAC 


AGG 


CAG 


TAG 



485 

Pro Thr Ser Gly 

CCG ACC AGC GGC 
GGC TGG TCG CCG 



490 

Asp val Val Val 

GAT GTT GTC GTC 
CTA CAA CAG CAG 



Val Ala Thr Asp 

GTG GCA ACC GAT 
CAC CGT TGG CTA 
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495 

Ala Leu Met Thr 

CCC CTC ATG ACC 
CGG GAG TAC TGG 



500 

Gly Tyr Thr Gly 

GGC TAT ACC GGC 
CCG ATA TGG CCG 



2079105 
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505 

Asp Phe Asp Ser 

GAC TTC GAC TCG 
CTG AAG CTG AGC 
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510 

Val lie Asp Cys 

GTG ATA GAC TGC 
CAC TAT CTG ACG 



Asn Thr Cys Val 

AAT ACG TGT GTC 
TTA TGC ACA CAG 



515 

Thr Gin Thr Val 

ACC CAG ACA GTC 
TGG GTC TGT CAG 



520 

Asp Phe Ser Leu 

GAT TTC AGC CTT 
CTA AAG TCG GAA 



525 

Asp Pro Thr Phe 

GAC CCT ACC TTC 
CTG GGA TGG AAG 



530 

Thr lie Glu Thr lie Thr Leu Pro 

ACC ATT GAG ACA ATC ACG CTC CCC 
TGG TAA CTC TGT TAG TGC GAG GGG 



535 540 

Gin Asp Ala Val Ser Arg Thr Gin 

CAA GAT GCT GTC TCC CGC ACT CAA 
GTT CTA CGA CAG AGG GCG TGA GTT 



545 

Arg Arg Gly Arg 

CGT CGG GGC AGG 
GCA GCC CCG TCC 



550 

Thr Gly Arg Gly 

ACT GGC AGG GGG 
TGA CCG TCC CCC 



Lys Pro Gly He 
AAG CCA GGC ATC 
TTC GGT CCG TAG 



555 

Tyr Arg Phe Val 
TAC AGA TTT GTG 
ATG TCT AAA CAC 



560 

Ala Pro Gly Glu 

GCA CCG GGG GAG 
CGT GGC CCC CTC 



565 

Arg Pro Pro Gly 

CGC CCT CCC GGC 
GCG GGA GGG CCG 



570 

Met Phe Asp Ser 

ATG TTC GAC TCG 
TAC AAG CTG AGC 



Ser Val Leu Cys 
TCC GTC CTC TGT 
AGG CAG GAG ACA 



575 

Glu cys Tyr Asp 

GAG TGC TAT GAC 
CTC ACG ATA CTG 



580 

Ala Gly cys Ala 
GCA GGC TGT GCT 
CGT CCG ACA CGA 



585 

Trp Tyr Glu Leu 
TGG TAT GAG CTC 
ACC ATA CTC GAG 



590 

Thr Pro Ala Glu 
ACG CCC GCC GAG 
TGC GGG CGG CTC 



Thr Thr Val Arg 

ACT ACA GTT AGG 
TGA TGT CAA TCC 



595 

Leu Arg Ala Tyr 
CTA CGA GCG TAC 
GAT GCT CGC ATG 



600 

Met Asn Thr Pro 
ATG AAC ACC CCG 
TAC TTG TGG GGC 



605 

Gly Leu Pro Val 
GGG CTT CCC GTG 
CCC GAA GGG CAC 
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610 

Cys Gin Asp His 

TCC CAG GAC CAT 
ACG GTC CTG GTA 



Leu Glu Phe Trp 
CTT GAA TTT TGG 
GAA CTT AAA ACC 



615 

Glu Gly Val Phe 

GAG GGC GTC TTT 
CTC CCG CAG AAA 



620 

Thr Gly Leu Thr 

ACA GGC CTC ACT 
TGT CCG GAG TGA 



625 

His lie Asp Ala 
CAT ATA GAT GCC 
GTA TAT CTA CGG 



630 

His Phe Leu Ser 
CAC TTT CTA TCC 
GTG AAA GAT AGG 



Gin Thr Lys Gin 

CAG ACA AAG CAG 
GTC TGT TTC GTC 



635 

Ser Gly Glu Asn 
AGT GGG GAG AAC 
TCA CCC CTC TTG 



640 

Leu Pro Tyr Leu 
CTT CCT TAC CTG 
GAA GGA ATG GAC 



645 

Val Ala Tyr Gin 
GTA GCG TAC CAA 
CAT CGC ATG GTT 



650 

Ala Thr val Cys 
GCC ACC GTG TGC 
CGG TGG CAC ACG 



Ala Arg Ala Gin 
GCT AGG GCT CAA 
CGA TCC CGA GTT 



655 

Ala Pro Pro Pro 
GCC CCT CCC CCA 
CGG GGA GGG GGT 



660 

Ser Trp Asp Gin 
TCG TGG GAC CAG 
AGC ACC CTG GTC 



665 

Met Trp Lys Cys 
ATG TGG AAG TGT 
TAC ACC TTC ACA 









670 


Leu 


He 


Arg 


Leu 


TTG 


ATT 


CGC 


CTC 


AAC 


TAA 


GCG 


GAG 



Lys Pro Thr Leu 

AAG CCC ACC CTC 
TTC GGG TGG GAG 



675 

His Gly Pro Thr 
CAT .GGG CCA ACA 
GTA CCC GGT TGT 



680 

Pro Leu Leu Tyr 

CCC CTG CTA TAC 
GGG GAC GAT ATG 



685 

Arg Leu Gly Ala 

AGA CTG GGC GCT 
TCT GAC CCG CGA 



Figure l 



WO 91/15575 
C2Qc : 

Asn Ser Glu Asn Gin Val Glu Gly 

AAT TCG GAA AAC CAA GTG GAG GGT 
TTA AGC CTT TTG GTT CAC CTC CCA 

t 

EcoRI 
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Glu Val Gin He Val Ser Thr Ala 

GAG GTC CAG ATT GTG TCA ACT GCT 
CTC CAG GTC TAA CAC AGT TGA CGA 



Ala Gin Thr Phe Leu Ala Thr Cys lie Asn Gly Val Cys Trp Thr Val 

GCC CAA ACC TTC CTG GCA ACG TGC ATC AAT GGG GTG TGC TGG ACT GTC 
CGG GTT TGG AAG GAC CGT TGC ACG TAG TTA CCC CAC ACG ACC TGA CAG 

t 

SfaNl 



Tyr His Gly Ala Gly Thr Arg Thr He Ala Ser Pro Lys Gly Pro Val 

TAC CAC GGG GCC GGA ACG AGG ACC ATC GCG TCA CCC AAG GGT CCT GTC 
ATG GTG CCC CGG CCT TGC TCC TGG TAG CGC AGT GGG TTC CCA GGA CAG 



lie Gin Met Tyr Thr Asn Val Asp Gin Asp Leu Val Gly Trp Pro Ala 

ATC CAG ATG TAT ACC AAT GTA GAC CAA GAC CTT GTG GGC TGG CCC GCT 
TAG GTC TAC ATA TGG TTA CAT CTG GTT CTG GAA CAC CCG ACC GGG CGA 



Ser Gin Gly Thr Arg Ser Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp 
TCG CAA GGT ACC CGC TCA TTG ACA CCC TGC ACT TGC GGC TCC TCG GAC 
AGC GTT CCA TGG GCG AGT AAC TGT GGG ACG TGA ACG CCG AGG AGC CTG 



Leu Tyr Leu val Thr Arg His Ala Asp Val lie Pro Val Arg Arg Arg 

CTT TAC CTG GTC ACG AGG CAC GCC GAT GTC ATT CCC GTG CGC CGG CGG 
GAA ATG GAC CAG TGC TCC GTG CGG CTA CAG TAA GGG CAC GCG GCC GCC 

t 

Nasi 

Gly Asp Ser Arg Gly Ser Leu Val Ser Pro Arg Pro lie Ser Tyr Leu 

GGT GAT AGC AGG GGC AGC CTC GTG TCG CCC CGG CCC ATT TCC TAC TTG 
CCA CTA TCG TCC CCG TCG GAG CAC AGC GGG GCC GGG TAA AGG ATG AAC 



Lys Gly Ser Ser Gly 

AAA GGC TCC TCG GGG 
TTT CCG AGG AGC CCC 



Gly Pro Leu Pro Asn 

GGT CCG CTG CCG AAT TC 
CCA GGC GAC GGC TTA AG 

t 

£coRI 
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C26d : 

Glu Phe Gly Gly Leu Leu Leu Cys 
GAA TTC GGG GGC CTG CTG TTG TGC 
CTT AAG CCC CCG GAC GAC AAC ACG 
T 

EcoRI 



Pro Ala Ala Ala Val Gly lie Phe 
CCC GCG GCA GCC GTG GGC ATA TTT 
GGG CGC CGT CGG CAC CCG TAT AAA 



Arg Ala Ala Val Cys Thr Arg Gly 

AGG GCC GCG GTG TGC ACC CGT GGA 
TCC CGG CGC CAC ACG TGG GCA CCT 



Val Ala Lys Ala Val Asp Phe He 

GTG GCT AAG GCG GTG GAC TTT ATC 
CAC CGA TTC CGC CAC CTG AAA TAG 
t 

Ddel 



Pro val Glu Asn Leu Glu Thr Thr Met Arg Ser Pro Val Phe Thr Asp 

CCT GTG GAG AAC CTA GAG ACA ACC ATG AGG TCC CCG GTG TTC ACG GAT 
GGA CAC CTC TTG GAT CTC TGT TGG TAC TCC AGG GGC CAC AAG TGC CTA 



Asn Ser Ser Pro Pro Val Val Pro 

AAC TCC TCT CCA CCA GTA GTG CCC 
TTG AGG AGA GGT GGT CAT CAC GGG 



Gin Ser Phe Gin Val Ala His Leu 

CAG AGC TTC CAG GTG GCT CAC CTC 
GTC TCG AAG GTC CAC CGA GTG GAG 

t 

EcoRII 



His Ala Pro Arg He 

CAT GCT CCC CGA ATT C 
GTA CGA GGG GCT TAA G 

t 

EcoRI 



Figure 3 
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Pro cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 

CCC TGC ACT TGC GGC TCC TCG GAC CTT TAC CTG GTC ACG AGG CAC GCC 



GGG 


ACG 


TGA 


% 0+0+ 

ACG 


CCG 


AGG 


AGC 


CTG 


0+ +. » 
GAA 


ATG 


GAC 


CAG 


TGC 


TCC 


GTG 


Asp 
GAT 
CTA 


Val 
GTC 
CAG 


He 
ATT 
TAA 


Pro 
CCC 
GGG 


val 

GTG 
CAC 


Arg 

CGC 
GCG 


Arg 
CGG 
GCC 


Arg 

CGG 
GCC 


Gly 
GGT 
CCA 


Asp 
GAT 
CTA 


Ser 
AGC 
TCG 


Arg Gly 
AGG GGC 
TCC CCG 


Ser 
AGC 
TCG 


Leu 
CTG 
GAC 


Ser 

TCG 
AGC 


Pro 
CCC 
GGG 


Arg 

CGG 
GCC 


Pro 
CCC 
GGG 


He 

ATT 
TAA 


Ser 

TCC 
AGG 


Tyr 
TAC 
ATG 


Leu 
TTG 

% m 0% 

AAC 


Lys 
AAA 

■MHH 

TTT 


Gly 

GGC 
CCG 


Ser 

TCC 
AGG 


Ser 
TCG 
AGC 


Gly Gly 
GGG GGT 
CCC CCA 


Pro 

CCG 
GGC 


Leu 
TTG 
AAC 


Cys 
TGC 
ACG 


Pro 

CCC 
GGG 


Ala 
GCG 
CGC 


Gly His 
GGG CAC 
CCC GTG 


Ala 

GCC 
CGG 


Val 

GTG 
CAC 


Gly 
GGC 
CCG 


He 
ATA 
TAT 


Phe Arg Ala Ala Val 

TTT AGG GCC GCG GTG 
AAA TCC CGG CGC CAC 


Thr 

ACC 
TGG 


Arg 
CGT 
GCA 


Gly 

GGA 
CCT 


Val 
GTG 
CAC 


Ala 

GCT 

CGA 
» 


Lys 
AAG 

TTC 


Ala 

GCG 
CGC 


Val 
GTG 
CAC 


Asp 
GAC 
CTG 


Phe 
TTT 
AAA 


He 
ATC 
TAG 


Pro 
CCT 
GGA 


Val 
GTG 
CAC 


Glu 
GAG 
CTC 


Asn 
AAC 
TTG 










Ddel 




















Glu 
GAG 
CTC 


Thr 
ACA 
TGT 


Thr 
ACC 
TGG 


Met 
ATG 
TAC 


Arg 

AGG 

TCC 


Ser 
TCC 
AGG 


Pro 

CCG 
GGC 


Val 
GTG 
CAC 


Phe 
TTC 
AAG 


Thr 

ACG 
TGC 


Asp 
GAT 
CTA 


Asn 
AAC 
TTG 


Ser 

TCC 
AGG 


TC 
AG 





Fi 
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£21: 

lie Arg Gly Thr Tyr Val Tyr Asn His Lea Thr Pro Leu Arg Asp Trp 

ATT CGG GGC ACC TAT GTT TAT AAC CAT CTC ACT CCT CTT CGG GAC TGG 
TAA GCC CCG TGG ATA CAA ATA TTG GTA GAG TGA GGA GAA GCC CTG ACC 
t 

EcoRI 

Ala His Asn Gly Leu Arg Asp Leu Ala Val Ala Val Glu Pro Val Val 

GCG CAC AAC GGC TTG CGA GAT CTG GCC GTG GCT GTA GAG CCA GTC GTC 
CGC GTG TTG CCG AAC GCT CTA GAC CGG CAC CGA CAT CTC GGT CAG CAG 



Phe Ser Gin Met Glu Thr Lys Leu He Thr Trp Gly Ala Asp Thr Ala 

TTC TCC CAA ATG GAG ACC AAG CTC ATC ACG TGG GGG GCA GAT ACC GCC 
AAG AGG GTT TAC CTC TGG TTC GAG TAG TGC ACC CCC CCT CTA TGG CGG 



Ala Cys Gly Asp He He Asn Gly Leu Pro Val Ser Ala Arg Arg Gly 

GCG TGC GGT GAC ATC ATC AAC GGC TTG CCT GTT TCC GCC CGC AGG GGC 
CGC ACG CCA CTG TAG TAG TTG CCG AAC GGA CAA AGG CGG GCG TCC CCG 



Arg Glu He Leu Leu Gly Pro Ala Asp Gly Met Val Ser Lys Gly Trp 

CGG GAG ATA CTG CTC GGG CCA GCC GAT GGA ATG GTC TCC AAG GGT TGG 
GCC CTC TAT GAC GAG CCC GGT CGG CTA CCT TAC CAG AGG TTC CCA ACC 



Arg Leu Leu Ala Pro He Thr Ala 
AGG TTG CTG GCG CCC ATC ACG GCG 
TCC AAC GAC CGC GGG TAG TGC CGC 



Tyr Ala Gin Gin Thr Arg Gly Leu 

TAC GCC CAG CAG ACA AGG GGC CTC 
ATG CGG GTC GTC TGT TCC CCG GAG 



Leu Gly Cys He He Thr Ser Leu 

CTA GGG TGC ATA ATC ACC AGC CTA 
GAT CCC ACG TAT TAG TGG TCG GAT 



Thr Gly Arg Asp Lys Asn Gin val 

ACT GGC CGG GAC AAA AAC CAA GTG 
TGA CCG GCC CTG TTT TTG GTT CAC 



Glu Gly Glu Val Gin He Val Ser 
GAG GGT GAG GTC CAG ATT GTG TCA 
CTC CCA CTC CAG GTC TAA CAC AGT 



Thr Ala Ala Gin Thr Phe Leu Ala 
ACT GCT GCC CAA ACC TTC CTG GCA 
TGA CGA CGG GTT TGG AAG GAC CGT 



Thr Cys He Asn Gly 

ACG TGC ATC AAT GGG 
TGC ACG TAG TTA CCC 

T 

SfaNI 



Val Cys Trp Pro Asn 
GTG TGC TGG CCG AAT TC 
CAC ACG ACC GGC TTA AG 

t 

£coRI 



Figure 5 
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Glu Phe Gly Ser 

GAA TTC GGG TCC 
CTT AAG CCC AGG 

r 

EcoRI 



Val lie Pro Thr 

GTC ATC CCG ACC 
CAG TAG GGC TGG 



Ser Gly Asp Val 

AGC GGC GAT GTT 
TCG CCG CTA CAA 



Val val Val Ala 

GTC GTC GTC GCA 
CAG CAG CAG CGT 



Thr Asp Ala Leu Hat Thr Gly Tyr 

ACC GAT GCC CTC ATG ACC GGC TAT 
TGG CTA CGG GAG TAC TGG CCG ATA 



Thr Gly Asp Phe Asp Ser Val He 

ACC GGC GAC TTC GAC TCG GTG ATA 
TGG CCG CTG AAG CTG AGC CAC TAT 

T 

Hinfl 



Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro 

GAC TGC AAT ACG TGT GTC ACC CAG ACA GTC GAT TTC AGC CTT GAC CCT 
CTG ACG TTA TGC ACA CAG TGG GTC TGT CAG CTA AAG TCG GAA CTG GGA 



Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg 

ACC TTC ACC ATT GAG ACA ATC ACG CTC CCC CAA GAT GCT GTC TCC CGC 
TGG AAG TGG TAA CTC TGT TAG TGC GAG GGG GTT CTA CGA CAG AGG GCG 



Thr Gin Arg Arg 

ACT CAA CGT CGG 
TGA GTT GCA GCC 



Gly Arg Thr Gly 

GGC AGG ACT GGC 
CCG TCC TGA CCG 



Arg Gly Lys Pro 

AGG GGG AAG CCA 
TCC CCC TTC GGT 



Gly lie Tyr Arg 
GGC ATC TAC AGA 
CCG TAG ATG TCT 



Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val 
TTT GTG GCA CCG GCG GAG CGC CCC TCC GGC ATG TTC GAC TCG TCC GTC 
AAA CAC CGT GGC CCC CTC GCG GGG AGG CCG TAC AAG CTG AGC AGG CAG 

T t 

Bgll Hinfl 



Leu Cys Glu Cys Pro Asn 
CTC TGT GAG TGC CCG AAT TC 
GAG ACA CTC ACG GGC TTA AG 

t 

£coRI 



He Arg Ser lie Glu Thr He Thr 

ATT CGG TCC ATT GAG ACA ATC ACG 
TAA GCC AGG TAA CTC TGT TAG TGC 

T 

ECORI 
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TCC GAT 
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Leu Pro Gin Asp Ala Val Ser Arg 

CTC CCC CAG GAT GCT GTC TCC CGC 
GAG GGG GTC CTA CGA CAG AGG GCG 



Cys Ala Trp Tyr Glu Leu Thr Pro 
TGT GCT TGG TAT GAG CTC ACG CCC 
ACA CGA ACC ATA CTC GAG TGC GGG 

Ala Tyr Het Asn Thr Pro Gly Leu 

GCG TAC ATG AAC ACC CCG GGG CTT 
CGC ATG TAC TTG TGG GGC CCC GAA 



Arg Thr Gly Arg Gly 

AGG ACT GGC AGG GGG 
TCC TGA CCG TCC CCC 

Glu Arg Pro Ser Gly 

GAG CGC CCC TCC GGC 
CTC GCG GGG AGG CCG 

t 

Bgll 



Lys Pro Gly He Tyr Arg 

AAG CCA GGC ATC TAC AGA 
TTC GGT CCG TAG ATG TCT 

Met Phe Asp Ser Ser Val 
ATG TTC GAC TCG TCC GTC 
TAC AAG CTG AGC AGG CAG 



Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly 

CCC GTG TGC CAG GAC CAT CTT GAA TTT TGG GAG GGC GTC TTT ACA GGC 

GGG CAC ACG GTC CTG GTA GAA CTT AAA ACC CTC CCG CAG AAA TGT CCG 

Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly 

CTC ACT CAT ATA GAT GCC CAC TTT CTA TCC CAG ACA AAG CAG AGT GGG 

GAG TGA GTA TAT CTA CGG GTG AAA GAT AGG GTC TGT TTC GTC TCA CCC 

Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg 

GAG AAC CTT CCT TAC CTG GTA GCG TAC CAA GCC ACC GTG TGC GCT AGG 

CTC TTG GAA GGA ATG GAC CAT CGC ATG GTT CGG TGG CAC ACG CGA TCC 

Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp Lys Cys Leu He 

GCT CAA GCC CCT CCC CCA TCG TGG GAC CAG ATG TCG AAG TGT TTG ATT 

CGA GTT CGG GGA GGG GGT AGC ACC CTG GTC TAC ACC TTC ACA AAC TAA 

Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu 

CGC CTC AAG CCC ACC CTC CAT GGG CCA ACA CCC CTG CTA TAC AGA CTG 

GCG GAG TTC GGG TGG GAG GTA CCC GGT TGT GGG GAC GAT ATG TCT GAC 

Gly Ala Ala Glu Phe 
GGC GCT GCC GAA TTC 
CCG CGA CGG CTT AAG 

t 

EcoRI 



Figure "7 



2079105 

WO 91/15575 

13/21 



Glu Phe Gly Ala 

GAA TTC GGG GCG 
CTT AAG CCC CGC 
T 

EcoRI 



Val Asp Phe lie 
GTG GAC TTT ATC 
CAC CTG AAA TAG 



Pro Val Glu Asn 
CCT GTG GAG AAC 
GGA CAC CTC TTG 
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Leu Glu Thr Thr 
CTA GAG ACA ACC 
GAT CTC TGT TGG 



Pro Val Val Pro 

CCA GTA GTG CCC 
GGT CAT CAC GGG 



Met Arg Ser Pro Val Phe 

ATG AGG TCC CCG GTG TTC 
TAC TCC AGG GGC CAC AAG 



Thr Asp Asn Ser Ser Pro 

ACG GAT AAC TCC TCT CCA 
TGC CTA TTG AGG AGA GGT 



Gin Ser Phe Gin Val Ala His Leu His Ala Pro Thr Gly Ser Gly Lys 

CAG AGC TTC CAG GTG GCT CAC CTC CAT GCT CCC ACA GGC AGC GGC AAA 
GTC TCG AAG GTC CAC CGA GTG GAG GTA CGA GGG TGT CCG TCG CCG TTT 



Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu 

AGC ACC AAG GTC CCG GCT GCA TAT GCA GCT CAG GGC TAT AAG GTG CTA 
TCG TGG TTC CAG GGC CGA CGT ATA CGT CGA GTC CCG ATA TTC CAC GAT 



Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Net 

GTA CTC AAC CCC TCT GTT GCT GCA ACA CTG GGC TTT GGT GCT TAC ATG 
CAT GAG TTG GGG AGA CAA CGA CGT TGT GAC CCG AAA CCA CGA ATG TAC 



Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly val Arg Thr 

TCC AAG GCT CAT GGG ATC GAT CCT AAC ATC AGG ACC GGG GTG AGA ACA 
AGG TTC CGA GTA CCC TAG CTA GGA TTG TAG TCC TGG CCC CAC TCT TGT 



He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu 
ATT ACC ACT GGC AGC CCC ATC ACG TAC TCC ACC TAC GGC AAG TTC CTT 
TAA TGG TGA CCG TCG GGG TAG TGC ATG AGG TGG ATG CCG TTC AAG GAA 



Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He lie Cys Asp 
GCC GAC GGC GGG TGC TCG GGG GGC GCT TAT GAC ATA ATA ATT TGT GAC 
CGG CTG CCG CCC ACG AGC CCC CCG CGA ATA CTG TAT TAT TAA ACA CTG 



Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val 

GAG TGC CAC TCC ACG GAT GCC ACA TCC ATC TTG GGC ATT GGC ACT GTC 
CTC ACG GTG AGG TGC CTA CGG TGT AGG TAG AAC CCG TAA CCG TGA CAG 
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Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr 
CTT GAC CAA GCA GAG ACT GCG GGG GCG AGA CTG GTT GTG CTC GCC ACC 
GAA CTG GTT CGT CTC TGA CGC CCC CGC TCT GAC CAA CAC GAG CGG TGG 



Ala Thr Pro Pro Gly Ser Val Thr 
GCC ACC CCT CCG GGC TCC GTC ACT 
CGG TGG GGA GGC CCG AGG CAG TGA 



Val Pro His Pro Asn He Glu Glu 
GTG CCC CAT CCC AAC ATC GAG GAG 
CAC GGG GTA GGG TTG TAG CTC CTC 



Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He 

GTT GCT CTG TCC ACC ACC GGA GAG ATC CCT TTT TAC GGC AAG GCT ATC 
CAA CGA GAC AGG TGG TGG CCT CTC TAG GGA AAA ATG CCG TTC CGA TAG 



Pro Leu Glu Val He Lys Gly Gly Arg His Leu lie Phe Cys His Ser 
CCC CTC GAA GTA ATC AAG GGG GGG AGA CAT CTC ATC TTC TGT CAT TCA 
GGG GAG CTT CAT TAG TTC CCC CCC TCT GTA GAG TAG AAG ACA GTA AGT 



Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He 
AAG AAG AAG TGC GAC GAA CTC GCC GCA AAG CTG GTC GCA TTG GGC ATC 
TTC TTC TTC ACG CTG CTT GAG CGG CGT TTC GAC CAG CGT AAC CCG TAG 



Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr 

AAT GCC GTG GCC TAC TAC CGC GGT CTT GAC GTG TCC GTC ATC CCG ACC 
TTA CGG CAC CGG ATG ATG GCG CCA GAA CTG CAC AGG CAG TAG GGC TGG 



Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr 
AGC GGC GAT GTT GTC GTC GTG GCA ACC GAT GCC CTC ATG ACC GGC TAT 
TCG CCG CTA CAA CAG CAG CAC CGT TGG CTA CGG GAG TAC TGG CCG ATA 



Thr Gly Asp Phe Asp Ser Val He 

ACC GGC GAC TTC CAC TCG GTG ATA 
TGG CCG CTG AAG CTG AGC CAC TAT 

t 

Hinfl 



Asp Cys Asn Thr Cys Ala Glu Phe 

GAC TGC AAT ACG TGT GCC GAA TTC 
CTG ACG TTA TGC ACA CGG CTT AAG 

t 

ECORI 



Figure 8 (Continued) 



WO 91/15575 2079105 PCT/US91/02210 

15/21 

Bgll 

i 

Hinfl — — — C35 

— — — — «— C31 

i 

— — C33c 



C200 



SfaUI 

1 



C7f 



C20c 



C7f+C20c 



Ddel 

i 

C8h — — — — £coRII 

i i 
C26d — — 

i 

C33C — — 



▼ 

C300 




WO 91/15575 



PCT/US9 1/022 10 



16/21 
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-155 -150 

Net Ala Thr Asn Pro Val Cys Val Leu 

ATG GCT ACA AAC CCT GTT TGC GTT TTG 
TAC CGA TGT TTG GGA CAA ACG CAA AAC 
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Lys Val Trp 


Gly 
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-100 


Gly 
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His 


Gly Phe His Val His 


Glu Phe Gly 


Asp 


Asn 


Thr Ala Gly 


GGC 


CTG 


CAT 


GGA 


TTC 


CAT 


GTT CAT 


GAG 


TTT 


GGA 


GAT 


AAT 
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GCA 


GGC 


CCG 
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CCT 
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GTA 


CAA GTA 
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AAA 


CCT 
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TTA 


TGT 


CGT 


CCG 
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Cys 


Thr 


Ser 


Pro 


Gly 


Pro 


His Phe 


Asn 


Pro 


Leu 


Ser 


Arg 


Lys 


His Gly 


TGT 


ACC 


AGT 


CCA 


GGT 


CCT 
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AAA 
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GGA 
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Gly 


Pro 


Lys 


Asp Glu Glu Arg His 


Val 


Gly Asp 


Leu 


Gly 


Asn 


Val 


Thr 
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GAT 
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-65 

Ala Asp Lys Asp Gly 

GCT GAC AAA GAT GGT 
CGA CTG TTT CTA CCA 

-50 

Ser Leu Ser Gly Asp 

TCA CTC TCA GGA GAC 
AGT GAG AGT CCT CTG 

-30 

Glu Lys Ala Asp Asp 

GAA AAA GCA GAT GAC 
CTT TTT CGT CTA CTG 

-15 

Thr Gly Asn Ala Gly 

ACA GGA AAC GCT GGA 
TGT CCT TTG CGA CCT 



-60 

Val Ala Asp Val Ser He 

GTG GCC GAT GTG TCT ATT 
CAC CGG CTA CAC AGA TAA 

-45 -40 

His Cys He He Gly Arg 
CAT TGC ATC ATT GGC CGC 
GTA ACG TAG TAA CCG GCG 

-25 

Leu Gly Lys Gly Gly Asn 

TTG GGC AAA GGT GGA AAT 
AAC CCG TTT CCA CCT TTA 

-10 

Ser Arg Leu Ala Cys Gly 
AGT CGT TTG GCT TGT GGT 
TCA GCA AAC CGA ACA CCA 



-55 

Glu Asp Ser Val He 

GAA GAT TCT GTG ATC 
CTT CTA AGA CAC TAG 

-35 

Thr Leu Val Val His 

ACA CTG GTG GTC CAT 
TGT GAC CAC CAG GTA 

-20 

Glu Glu Ser Thr Lys 
GAA GAA AGT ACA AAG 
CTT CTT TCA TGT TTC 

-5 

Val He Gly He Arg 
GTA ATT GGG ATC CGA 
CAT TAA CCC TAG GCT 



WO 91/15575 PCT/US91/02210 

18/21 



1 5 

Arg lie Cly Thr Tyr Val Tyr Asn 

ATT CGG CGC ACC TAT GTT TAT AAC 
TAA GCC CCG TGG ATA CAA ATA TTG 



10 

His Leu Thr Pro Leu Arg Asp Trp 

CAT CTC ACT CCT CTT CGG GAC TGG 
GTA GAG TGA GGA GAA GCC CTG ACC 



15 

Ala His Asn Gly 

GCG CAC AAC GGC 
CGC GTG TTG CCG 



20 

Leu Arg Asp Leu 

TTG CGA GAT CTG 
AAC GCT CTA GAC 



25 

Ala Val Ala Val 

GCC GTG GCT GTA 
CGG CAC CGA CAT 









30 


GlU 


Pro 


val 


Val 


GAG 


CCA 


GTC 


GTC 


CTC 


GGT 


CAG 


CAG 



Phe Ser Gin Het 

TTC TCC CAA ATG 
AAG AGG GTT TAC 



35 

Glu Thr Lys Leu 
GAG ACC AAG CTC 
CTC TGG TTC GAG 



40 

lie Thr Trp Gly 

ATC ACG TGG GGG 
TAG TGC ACC CCC 



45 

Ala Asp Thr Ala 
GCA GAT ACC GCC 
CGT CTA TGG CGG 



50 

Ala Cys Gly Asp 
GCG TGC GGT GAC 
CGC ACG CCA CTG 



He He Asn Gly 
ATC ATC AAC GGC 
TAG TAG TTG CCG 



55 

Leu Pro Val Ser 
TTG CCT GTT TCC 
AAC GGA CAA AGG 





60 






Ala 


Arg 


Arg 


Gly 


GCC 


CGC 


AGG 


GGC 


CGG 


GCG 


TCC 


CCG 



65 

Arg Glu He Leu 
CGG GAG ATA CTG 
GCC CTC TAT GAC 



70 

Leu Gly Pro Ala 
CTC GGG CCA GCC 
GAG CCC GGT CGG 



Asp Gly Net Val 
GAT GGA ATG GTC 
CTA CCT TAC CAG 



75 








Ser 


Lys 


Gly 


Trp 


TCC 


AAG 


GGT 


TGG 


AGG 


TTC 


CCA 


ACC 



80 85 90 

Arg Leu* Leu Ala Pro He Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu 

AGG TTG CTG GCG CCC ATC ACG GCG TAC GCC CAG CAG ACA AGG GGC CTC 
TCC AAC GAC CCC GGG TAG TGC CGC ATG CGG GTC CTC TGT TCC CCG GAG 



95 

Leu Gly Cys He 
CTA GGG TGC ATA 
GAT CCC ACG TAT 



100 

He Thr Ser Leu 
ATC ACC AGC CTA 
TAG TGG TCG GAT 



105 

Thr Gly Arg Asp 
ACT GGC CGG GAC 
TGA CCG GCC CTG 









110 


Lys 


Asn 


Gin 


Val 


AAA 


AAC 


CAA 


GTG 


TTT 


TTG 


GTT 


CAC 



Glu Gly Glu Val 
GAG GGT GAG GTC 
CTC CCA CTC CAG 



115 

Gin He Val Ser 

CAG ATT GTG TCA 
GTC TAA CAC AGT 



120 

Thr Ala Ala Gin 

ACT GCT GCC CAA 
TGA CGA CGG GTT 







125 




Thr 


Phe 


Leu 


Ala 


ACC 


TTC 


CTG 


GCA 


TGG 


AAG 


GAC 


CGT 
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130 



Thr 

ACG 
TGC 


Cys 

TGC 
ACG 


He 

ATC 
TAG 


He 
ATC 
TAG 


Thr 
ACG 
TGC 


Arg 
AGG 
TCC 


145 
Thr 
ACC 
TGG 


He 
ATC 
TAG 


Asn 
AAT 
TTA 


160 
Val 
GTA 
CAT 


Asp 
GAC 
CTG 


Gin 
CAA 
GTT 


175 

Ser 
TCA 
AGT 


Leu 

TTG 
AAC 


Thr 

ACA 
TGT 


Pro 

ccc 

GGG 


Arg 
AGG 
TCC 


His 

CAC 
GTG 


Ala 

GCC 
CGG 


Asp 
GAT 
CTA 


Ser 
AGC 


Leu 

CTG 

GAC 


Leu 
CTG 


210 

Ser 

TCG 

AGC 


Gly 
GGT 
CCA 


Pro 
CCG 
GGC 


225 
Leu 
CTG 
GAC 


Leu 

TTG 
AAC 


Ala 

GCG 
CGC 


240 
Val 
GTG 
CAC 


Cys 
TGC 
ACG 


Thr 

ACC 
TGG 



Asn Gly Val Cys 

AAT GGG GTG TGC 
TTA CCC CAC ACG 



150 

Ala Ser Pro Lys 

GCG TCA CCC AAG 
CGC AGT GGG TTC 



165 

Asp Leu Val Gly 

GAC CTT GTG GGC 
CTG GAA CAC CCG 



180 

Cys Thr Cys Gly 

TGC ACT TGC GGC 
ACG TGA ACG CCG 



195 

Val He Pro Val 
GTC ATT CCC GTG 
GAG TAA GGG CAC 



Pro Arg Pro He 

CCC CGG CCC ATT 
GGG GCC GGG TAA 



230 

Cys Pro Ala Gly 

TGC CCC GCG GGG 
ACG GGG CGC CCC 



245 

Arg Gly Val Ala 

CGT GGA GTG GCT 
GCA CCT CAC CGA 



135 

Trp Thr Val Tyr 
TGG ACT GTC TAC 
ACC TGA CAG ATG 



Gly Pro val He 
GGT CCT GTC ATC 
CCA. GGA CAG TAG 



170 

Trp Pro Ala Ser 

TGG CCC GCT TCG 
ACC GGG CGA AGC 



185 

Ser Ser Asp Leu 

TCC TCG GAC CTT 
AGG AGC CTG GAA 



200 

Arg Arg Arg Gly 

CGC CGG CGG GGT 
GCG GCC GCC CCA 
T 

Nael 
215 

Ser Tyr Leu Lys 
TCC TAC TTG AAA 
AGG ATG AAC TTT 



His Ala Val Gly 
CAC GCC GTG GGC 
GTG CGG CAC CCG 



250 

Lys Ala Val Asp 

AAG GCG GTG GAC 
TTC CGC CAC CTG 



140 

His Gly Ala Gly 
CAC GGG GCC GGA 
GTG CCC CGG CCT 



155 

Gin Met Tyr Thr 

CAG ATG TAT ACC 
GTC TAC ATA TGG 



Gin Gly Thr Arg 
CAA GGT ACC CGC 
GTT CCA TGG GCG 



190 

Tyr Leu Val Thr 

TAC CTG GTC ACG 
ATG GAC CAG TGC 



205 

Asp Ser Arg Gly 

GAT AGC AGG GGC 
CTA TCG TCC CCG 



220 

Gly Ser Ser Gly 

GGC TCC TCG GGG 
CCG AGG AGC CCC 



235 

He Phe Arg Ala 

ATA TTT AGG GCC 
TAT AAA TCC CGG 



Phe He Pro Val 

TTT ATC CCT GTG 
AAA TAG GGA CAC 



Figure X O C contiinued > 
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Glu Asn Leu Glu 
GAG AAC CTA GAG 
CTC TTG GAT CTC 



260 

Thr Thr Met Arg 

ACA ACC ATG AGG 
TGT TGG TAG TCC 



20/21 

265 

Ser Pro Val Phe 

TCC CCC GTG TTC 
AGG GGC CAC AAG 



PCT/US91/02210 
270 

Thr Asp Asn Ser 

ACG GAT AAC TCC 
TGC CTA TTG AGG 



Ser Pro Pro Val 

TCT CCA CCA GTA 
AGA GGT GGT CAT 



275 

Val Pro Gin ser 

GTG CCC CAG AGC 
CAC GGG GTC TCG 



280 

Phe Gin Val Ala 
TTC CAG GTG GCT 
AAG GTC CAC CGA 



285 

His Leu His Ala 
CAC CTC CAT GCT 
GTG GAG GTA CGA 



290 295 300 

Pro Thr Gly Ser Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala 

CCC ACA GGC AGC GGC AAA AGC ACC AAG GTC CCG GCT GCA TAT GCA GCT 
GGG TGT CCG TCG CCG TTT TCG TGG TTC CAG GGC CGA CCT ATA CGT CGA 

t 

Wdel 

305 310 315 

Gin Gly Tyr Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu 

CAG GGC TAT AAG GTG CTA GTA CTC AAC CCC TCT GTT GCT GCA ACA CTG 
GTC CCG ATA TTC CAC GAT CAT GAG TTG GGG AGA CAA CGA CGT TGT GAC 



320 

Gly Phe Gly Ala 
GGC TTT GGT GCT 
CCG AAA CCA CGA 



325 

Tyr Net Ser Lys 

TAC ATG TCC AAG 
ATG TAC AGG TTC 



330 

Ala His Gly lie 
GCT CAT GGG ATC 
CGA GTA CCC TAG 



Asp Pro Asn lie 

GAT CCT AAC ATC 
CTA GGA TTG TAG 



335 

Arg Thr Gly Val 

AGG ACC GGG GTG 
TCC TGG CCC CAC 



340 

Arg Thr He Thr 
AGA ACA ATT ACC 
TCT TGT TAA TGG 



345 

Thr Gly Ser Pro 

ACT GGC AGC CCC 
TGA CCG TCG GGG 



350 

He Thr Tyr Ser 
ATC ACG TAC TCC 
TAG TGC ATG AGG 



Thr Tyr Gly Lys 

ACC TAC GGC AAG 
TGG ATG CCG TTC 



355 

Phe Leu Ala Asp 

TTC CTT GCC GAC 
AAG GAA CGG CTG 



360 

Gly Gly Cys Ser 
GGC GGG TGC TCG 
CCG CCC ACG AGC 



365 

Gly Gly Ala Tyr 

GGG GGC GCT TAT 
CCC CCG CGA ATA 



370 

Asp He He He 
GAC ATA ATA ATT 
CTG TAT TAT TAA 



Cys Asp Glu Cys 

TGT GAC GAG TGC 
ACA CTG CTC ACG 



375 

His Ser Thr Asp 

CAC TCC ACG GAT 
GTG AGG TGC CTA 



380 

Ala Thr Ser He 

GCC ACA TCC ATC 
CGG TGT AGG TAG 



Fi^xaare lO (continued) 
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Leu Gly lie Gly Thr Val Leu Asp 

TTG GGC ATT GGC ACT GTC CTT GAC 
AAC CCG TAA CCG TGA CAG GAA CTG 
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395 

Gin Ala Glu Thr Ala Gly Ala Arg 

GAA GCA GAG ACT GCG GGG GCG AGA 
GTT CGT CTC TGA CGC CCC CGC TCT 



400 405 
Leu Val Val Leu Ala Thr Ala Thr 
CTG GTT GTG CTC GCC ACC GCC ACC 
GAC CAA CAC GAG CGG TGG CGG TGG 



410 

Pro Pro Gly Ser Val Thr Val Pro 

CCT CCG GGC TCC GTC ACT GTG CCC 
GGA GGC CCG AGG CAG TGA CAC GGG 



415 








His 


Pro 


Asn 


He 


CAT 


CCC 


AAC 


ATC 


GTA 


GGG 


TTG 


TAG 



420 

Glu Glu Val Ala 
GAG GAG GTT GCT 
CTC CTC CAA CGA 



425 

Leu Ser Thr Thr 

CTG TCC ACC ACC 
GAC AGG TGG TGG 



430 

Gly Glu He Pro 
GGA GAG ATC CCT 
CCT CTC TAG GGA 



Phe Tyr Gly Lys 

TTT TAC GGC AAG 
AAA ATG CCG TTC 



435 

Ala He Pro Leu 

CCT ATC CCC CTC 
CGA TAG GGG GAG 



440 

Glu Val He Lys 

GAA GTA ATC AAG 
CTT CAT TAG TTC 



445 

Gly Gly Arg His 

GGG GGG AGA CAT 
CCC CCC TCT GTA 



450 

Leu He Phe Cys 

CTC ATC TTC TGT 
GAG TAG AAG AGA 



His Ser Lys Lys 

CAT TCA AAG AAG 
GTA AGT TTC TTC 



455 

Lys Cys Asp Glu 

AAG TGC GAC GAA 
TTC ACG CTG CTT 



460 

Leu Ala Ala Lys 

CTC GCC GCA AAG 
GAG CGG CGT TTC 







465 




Leu 


Val 


Ala 


Leu 


CTG 


GTC 


GCA 


TTG 


GAC 


CAG 


CGT 


AAC 



470 

Gly He Asn Ala 
GGC ATC AAT GCC 
CCG TAG TTA CGG 



Val Ala Tyr Tyr 
GTG GCC TAC TAC 
CAC CGG ATG ATG 



475 

Arg Gly Leu Asp 

CGC GGT CTT GAC 
GCG CCA GAA CTG 





480 






Val 


Ser 


Val 


He 


GTG 


TCC 


GTC 


ATC 


CAC 


AGG 


CAG 


TAG 



485 

Pro Thr Ser Gly 

CCG ACC AGC GGC 
GGC TGG TCG CCG 



490 

Asp Val Val Val 

GAT GTT GTC GTC 
CTA CAA CAG CAG 



Val Ala Thr Asp 
GTG GCA ACC GAT 
CAC CGT TGG CTA 



Figure JLO (continued) 



